summaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_output.c
Commit message (Collapse)AuthorAgeFilesLines
...
* MFp4 (//depot/projects/tcpecn/):Rui Paulo2008-07-311-0/+43
| | | | | | | | | | | TCP ECN support. Merge of my GSoC 2006 work for NetBSD. TCP ECN is defined in RFC 3168. Partly reviewed by: dwmalone, silby Obtained from: NetBSD Notes: svn path=/head/; revision=181056
* Fix commment in typo.Rui Paulo2008-07-151-1/+1
| | | | | | | M tcp_output.c Notes: svn path=/head/; revision=180535
* Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros toRobert Watson2008-04-171-1/+1
| | | | | | | | | | | | | | | | | | explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch) Notes: svn path=/head/; revision=178285
* Remove TCP options ordering assumptions in tcp_addoptions(). OrderingAndre Oppermann2008-04-071-1/+11
| | | | | | | | | | | | was changed in rev. 1.161 of tcp_var.h. All option now test for sufficient space in TCP header before getting added. Reported by: Mark Atkinson <atkin901-at-yahoo.com> Tested by: Mark Atkinson <atkin901-at-yahoo.com> MFC after: 1 week Notes: svn path=/head/; revision=177988
* Remove now unnecessary comment.Andre Oppermann2008-04-071-2/+0
| | | | Notes: svn path=/head/; revision=177987
* Use #defines for TCP options padding after EOL to be consistent.Andre Oppermann2008-04-071-2/+2
| | | | | | | Reviewed by: bz Notes: svn path=/head/; revision=177986
* Padding after EOL option must be zeros according to RFC793 butBjoern A. Zeeb2008-03-091-2/+10
| | | | | | | | | | | | | | the NOPs used are 0x01. While we could simply pad with EOLs (which are 0x00), rather use an explicit 0x00 constant there to not confuse poeple with 'EOL padding'. Put in a comment saying just that. Problem discussed on: src-committers with andre, silby, dwhite as follow up to the rev. 1.161 commit of tcp_var.h. MFC after: 11 days Notes: svn path=/head/; revision=176978
* Centralize and correct computation of TCP-MD5 signature offset withinBjoern A. Zeeb2007-11-301-8/+3
| | | | | | | | | | | the packet (tcp header options field). Reviewed by: tools/regression/netinet/tcpconnect MFC after: 3 days Tested by: Nick Hilliard (see net@) Notes: svn path=/head/; revision=174120
* Let opt be an array. Though &opt[0] == opt == &opt, &opt is highlyBjoern A. Zeeb2007-11-281-1/+1
| | | | | | | | | | | confusing and hard to understand so change it to just opt and remove the extra cast no longer/not needed. Discussed with: rwatson MFC after: 3 days Notes: svn path=/head/; revision=174023
* Make TSO work with IPSEC compiled into the kernel.Bjoern A. Zeeb2007-11-211-3/+16
| | | | | | | | | | | | | | The lookup hurts a bit for connections but had been there anyway if IPSEC was compiled in. So moving the lookup up a bit gives us TSO support at not extra cost. PR: kern/115586 Tested by: gallatin Discussed with: kmacy MFC after: 2 months Notes: svn path=/head/; revision=173835
* Merge first in a series of TrustedBSD MAC Framework KPI changesRobert Watson2007-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer Notes: svn path=/head/; revision=172930
* Add FBSDID to all files in netinet so that people can moreMike Silbersack2007-10-071-1/+3
| | | | | | | | | easily include file version information in bug reports. Approved by: re (kensmith) Notes: svn path=/head/; revision=172467
* Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSECGeorge V. Neville-Neil2007-07-031-3/+3
| | | | | | | | | | | option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing Notes: svn path=/head/; revision=171167
* Commit IPv6 support for FAST_IPSEC to the tree.George V. Neville-Neil2007-07-011-6/+1
| | | | | | | | | | | | This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing Notes: svn path=/head/; revision=171133
* Make the handling of the tcp window explicit for the SYN_SENT caseAndre Oppermann2007-06-091-4/+10
| | | | | | | | | | | | in tcp_outout(). This is currently not strictly necessary but paves the way to simplify the entire SYN options handling quite a bit. Clarify comment. No change in effective behavour with this commit. RFC1323 requires the window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself never be scaled. Notes: svn path=/head/; revision=170470
* Don't send pure window updates when the peer has closed the connectionAndre Oppermann2007-06-091-1/+4
| | | | | | | and won't ever send more data. Notes: svn path=/head/; revision=170467
* Fix statistical accounting for bytes and packets during sack retransmits.John Baldwin2007-05-181-1/+1
| | | | | | | | MFC after: 1 week Submitted by: mohans Notes: svn path=/head/; revision=169682
* Fix an incorrect replace of a timer reference made during the TCP timerAndre Oppermann2007-05-101-1/+1
| | | | | | | | rewrite in rev. 1.132. This unmasked yet another bug that causes certain connections to get indefinately stuck in LAST_ACK state. Notes: svn path=/head/; revision=169457
* Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead ofAndre Oppermann2007-05-061-4/+6
| | | | | | | a decdicated sack_enable int for this bool. Change all users accordingly. Notes: svn path=/head/; revision=169317
* o Remove unused and redundant TCP option definitionsAndre Oppermann2007-04-201-4/+4
| | | | | | | | o Replace usage of MAX_TCPOPTLEN with the correctly constructed and derived MAX_TCPOPTLEN Notes: svn path=/head/; revision=168904
* Change the TCP timer system from using the callout system five timesAndre Oppermann2007-04-111-28/+25
| | | | | | | | | | | | | | | | | | | directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version) Notes: svn path=/head/; revision=168615
* Retire unused TCP_SACK_DEBUG.Andre Oppermann2007-04-041-1/+0
| | | | Notes: svn path=/head/; revision=168364
* ANSIfy function declarations and remove register keywords for variables.Andre Oppermann2007-03-211-2/+1
| | | | | | | Consistently apply style to all function declarations. Notes: svn path=/head/; revision=167785
* Subtract optlen in the maximum length check for TSO and finally avoidAndre Oppermann2007-03-211-1/+1
| | | | | | | | | slightly oversized TSO mbuf chains. Submitted by: kmacy Notes: svn path=/head/; revision=167780
* Match up SYSCTL_INT declarations in style.Andre Oppermann2007-03-191-2/+2
| | | | Notes: svn path=/head/; revision=167718
* Maintain a pointer and offset pair into the socket buffer mbuf chain toAndre Oppermann2007-03-191-3/+13
| | | | | | | | | | | | avoid traversal of the entire socket buffer for larger offsets on stream sockets. Adjust tcp_output() make use of it. Tested by: gallatin Notes: svn path=/head/; revision=167715
* Consolidate insertion of TCP options into a segment from within tcp_output()Andre Oppermann2007-03-151-145/+198
| | | | | | | | | | | | | | | | | and syncache_respond() into its own generic function tcp_addoptions(). tcp_addoptions() is alignment agnostic and does optimal packing in all cases. In struct tcpopt rename to_requested_s_scale to just to_wscale. Add a comment with quote from RFC1323: "The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself is never scaled." Reviewed by: silby, mohans, julian Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=167606
* Prevent TSO mbuf chain from overflowing a few bytes by subtracting theAndre Oppermann2007-03-011-2/+3
| | | | | | | | | TCP options size before the TSO total length calculation. Bug found by: kmacy Notes: svn path=/head/; revision=167139
* Add EHOSTDOWN and ENETUNREACH to the list of soft errors, that shouldn'tGleb Smirnoff2007-02-281-0/+2
| | | | | | | | | | | be returned up to the caller. PR: 100172 Submitted by: "Andrew - Supernews" <andrew supernews.net> Reviewed by: rwatson, bms Notes: svn path=/head/; revision=167107
* Toss the code, that handles errors from ip_output(), to make it moreGleb Smirnoff2007-02-281-30/+26
| | | | | | | | | | | readable: - Merge two embedded if() into one. - Introduce switch() block to handle different kinds of errors. Reviewed by: rwatson, bms Notes: svn path=/head/; revision=167106
* Auto sizing TCP socket buffers.Andre Oppermann2007-02-011-4/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally the socket buffers are static (either derived from global defaults or set with setsockopt) and do not adapt to real network conditions. Two things happen: a) your socket buffers are too small and you can't reach the full potential of the network between both hosts; b) your socket buffers are too big and you waste a lot of kernel memory for data just sitting around. With automatic TCP send and receive socket buffers we can start with a small buffer and quickly grow it in parallel with the TCP congestion window to match real network conditions. FreeBSD has a default 32K send socket buffer. This supports a maximal transfer rate of only slightly more than 2Mbit/s on a 100ms RTT trans-continental link. Or at 200ms just above 1Mbit/s. With TCP send buffer auto scaling and the default values below it supports 20Mbit/s at 100ms and 10Mbit/s at 200ms. That's an improvement of factor 10, or 1000%. For the receive side it looks slightly better with a default of 64K buffer size. New sysctls are: net.inet.tcp.sendbuf_auto=1 (enabled) net.inet.tcp.sendbuf_inc=8192 (8K, step size) net.inet.tcp.sendbuf_max=262144 (256K, growth limit) net.inet.tcp.recvbuf_auto=1 (enabled) net.inet.tcp.recvbuf_inc=16384 (16K, step size) net.inet.tcp.recvbuf_max=262144 (256K, growth limit) Tested by: many (on HEAD and RELENG_6) Approved by: re MFC after: 1 month Notes: svn path=/head/; revision=166405
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hRobert Watson2006-10-221-1/+2
| | | | | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA Notes: svn path=/head/; revision=163606
* When tcp_output() receives an error upon sending a packet it reverts partsAndre Oppermann2006-09-281-2/+15
| | | | | | | | | | | | | | | | | | | of its internal state to ignore the failed send and try again a bit later. If the error is EPERM the packet got blocked by the local firewall and the revert may cause the session to get stuck and retry indefinitely. This way we treat it like a packet loss and let the retransmit timer and timeouts do their work over time. The correct behavior is to drop a connection that gets an EPERM error. However this _may_ introduce some POLA problems and a two commit approach was chosen. Discussed with: glebius PR: kern/25986 PR: kern/102653 Notes: svn path=/head/; revision=162739
* When doing TSO correctly do the check to prevent a maximum sized IP packetAndre Oppermann2006-09-281-1/+1
| | | | | | | from overflowing. Notes: svn path=/head/; revision=162725
* When doing TSO subtract hdrlen from TCP_MAXWIN to prevent ip->ip_lenAndre Oppermann2006-09-151-5/+7
| | | | | | | | | | | from wrapping when we generate a maximally sized packet for later segmentation. Noticed by: gallatin Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=162325
* Rewrite of TCP syncookies to remove locking requirements and to enhanceAndre Oppermann2006-09-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | functionality: - Remove a rwlock aquisition/release per generated syncookie. Locking is now integrated with the bucket row locking of syncache itself and syncookies no longer add any additional lock overhead. - Syncookie secrets are different for and stored per syncache buck row. Secrets expire after 16 seconds and are reseeded on-demand. - The computational overhead for syncookie generation and verification is one MD5 hash computation as before. - Syncache can be turned off and run with syncookies only by setting the sysctl net.inet.tcp.syncookies_only=1. This implementation extends the orginal idea and first implementation of FreeBSD by using not only the initial sequence number field to store information but also the timestamp field if present. This way we can keep track of the entire state we need to know to recreate the session in its original form. Almost all TCP speakers implement RFC1323 timestamps these days. For those that do not we still have to live with the known shortcomings of the ISN only SYN cookies. The use of the timestamp field causes the timestamps to be randomized if syncookies are enabled. The idea of SYN cookies is to encode and include all necessary information about the connection setup state within the SYN-ACK we send back and thus to get along without keeping any local state until the ACK to the SYN-ACK arrives (if ever). Everything we need to know should be available from the information we encoded in the SYN-ACK. A detailed description of the inner working of the syncookies mechanism is included in the comments in tcp_syncache.c. Reviewed by: silby (slightly earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=162277
* Second step of TSO (TCP segmentation offload) support in our network stack.Andre Oppermann2006-09-071-15/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | TSO is only used if we are in a pure bulk sending state. The presence of TCP-MD5, SACK retransmits, SACK advertizements, IPSEC and IP options prevent using TSO. With TSO the TCP header is the same (except for the sequence number) for all generated packets. This makes it impossible to transmit any options which vary per generated segment or packet. The length of TSO bursts is limited to TCP_MAXWIN. The sysctl net.inet.tcp.tso globally controls the use of TSO and is enabled. TSO enabled sends originating from tcp_output() have the CSUM_TCP and CSUM_TSO flags set, m_pkthdr.csum_data filled with the header pseudo-checksum and m_pkthdr.tso_segsz set to the segment size (net payload size, not counting IP+TCP headers or TCP options). IPv6 currently lacks a pseudo-header checksum function and thus doesn't support TSO yet. Tested by: Jack Vogel <jfvogel-at-gmail.com> Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=162110
* This patch fixes the problem where the current TCP code can not handleQing Li2006-02-231-1/+2
| | | | | | | | | | | | | simultaneous open. Both the bug and the patch were verified using the ANVL test suite. PR: kern/74935 Submitted by: qingli (before I became committer) Reviewed by: andre MFC after: 5 days Notes: svn path=/head/; revision=155961
* Consolidate all IP Options handling functions into ip_options.[ch] andAndre Oppermann2005-11-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | include ip_options.h into all files making use of IP Options functions. From ip_input.c rev 1.306: ip_dooptions(struct mbuf *m, int pass) save_rte(m, option, dst) ip_srcroute(m0) ip_stripoptions(m, mopt) From ip_output.c rev 1.249: ip_insertoptions(m, opt, phlen) ip_optcopy(ip, jp) ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m) No functional changes in this commit. Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=152592
* Retire MT_HEADER mbuf type and change its users to use MT_DATA.Andre Oppermann2005-11-021-2/+2
| | | | | | | | | | | | | | | Having an additional MT_HEADER mbuf type is superfluous and redundant as nothing depends on it. It only adds a layer of confusion. The distinction between header mbuf's and data mbuf's is solely done through the m->m_flags M_PKTHDR flag. Non-native code is not changed in this commit. For compatibility MT_HEADER is mapped to MT_DATA. Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=151967
* Replace t_force with a t_flag (TF_FORCEDATA).Paul Saab2005-05-211-6/+8
| | | | | | | | Submitted by: Raja Mukerji. Reviewed by: Mohan, Silby, Andre Opperman. Notes: svn path=/head/; revision=146463
* When looking for the next hole to retransmit from the scoreboard,Paul Saab2005-05-111-2/+7
| | | | | | | | | | | | | | | | | | | | | or to compute the total retransmitted bytes in this sack recovery episode, the scoreboard is traversed. While in sack recovery, this traversal occurs on every call to tcp_output(), every dupack and every partial ack. The scoreboard could potentially get quite large, making this traversal expensive. This change optimizes this by storing hints (for the next hole to retransmit and the total retransmitted bytes in this sack recovery episode) reducing the complexity to find these values from O(n) to constant time. The debug code that sanity checks the hints against the computed value will be removed eventually. Submitted by: Mohan Srinivasan, Noritoshi Demizu, Raja Mukerji. Notes: svn path=/head/; revision=146123
* Fix for interaction problems between TCP SACK and TCP Signature.Paul Saab2005-04-211-45/+84
| | | | | | | | | | | | | | If TCP Signatures are enabled, the maximum allowed sack blocks aren't going to fit. The fix is to compute how many sack blocks fit and tack these on last. Also on SYNs, defer padding until after the SACK PERMITTED option has been added. Found by: Mohan Srinivasan. Submitted by: Mohan Srinivasan, Noritoshi Demizu. Reviewed by: Raja Mukerji. Notes: svn path=/head/; revision=145372
* Ignore ICMP Source Quench messages for TCP sessions. Source Quench isAndre Oppermann2005-04-211-1/+1
| | | | | | | | | | | | | | ineffective, depreciated and can be abused to degrade the performance of active TCP sessions if spoofed. Replace a bogus call to tcp_quench() in tcp_output() with the direct equivalent tcpcb variable assignment. Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.1 MFC after: 3 days Notes: svn path=/head/; revision=145355
* Fix a TCP SACK related crash resulting from incorrect computationPaul Saab2005-01-121-6/+16
| | | | | | | | | | | of len in tcp_output(), in the case where the FIN has already been transmitted. The mis-computation of len is because of a gcc optimization issue, which this change works around. Submitted by: Mohan Srinivasan Notes: svn path=/head/; revision=140138
* /* -> /*- for license, minor formatting changesWarner Losh2005-01-071-1/+1
| | | | Notes: svn path=/head/; revision=139823
* Fixes a bug in SACK causing us to send data beyond the receive window.Paul Saab2004-11-291-2/+4
| | | | | | | | Found by: Pawel Worach and Daniel Hartmeier Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Notes: svn path=/head/; revision=138199
* Remove RFC1644 T/TCP support from the TCP side of the network stack.Andre Oppermann2004-11-021-82/+2
| | | | | | | | | | | | | | | | | | | A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch Notes: svn path=/head/; revision=137139
* Correct a bug in TCP SACK that could result in wedging of the TCP stackRobert Watson2004-10-301-2/+2
| | | | | | | | | | | | | | under high load: only set function state to loop and continuing sending if there is no data left to send. RELENG_5_3 candidate. Feet provided: Peter Losher <Peter underscore Losher at isc dot org> Diagnosed by: Aniel Hartmeier <daniel at benzedrine dot cx> Submitted by: mohan <mohans at yahoo-inc dot com> Notes: svn path=/head/; revision=137066
* Acquire the send socket buffer lock around tcp_output() activitiesRobert Watson2004-10-091-2/+14
| | | | | | | | | | | | | reaching into the socket buffer. This prevents a number of potential races, including dereferencing of sb_mb while unlocked leading to a NULL pointer deref (how I found it). Potentially this might also explain other "odd" TCP behavior on SMP boxes (although haven't seen it reported). RELENG_5 candidate. Notes: svn path=/head/; revision=136327