summaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_output.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix the interaction between 'ICMP fragmentation needed' MTU updates,Andre Oppermann2010-08-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | path MTU discovery and the tcp_minmss limiter for very small MTU's. When the MTU suggested by the gateway via ICMP, or if there isn't any the next smaller step from ip_next_mtu(), is lower than the floor enforced by net.inet.tcp.minmss (default 216) the value is ignored and the default MSS (512) is used instead. However the DF flag in the IP header is still set in tcp_output() preventing fragmentation by the gateway. Fix this by using tcp_minmss as the MSS and clear the DF flag if the suggested MTU is too low. This turns off path MTU dissovery for the remainder of the session and allows fragmentation to be done by the gateway. Only MTU's smaller than 256 are affected. The smallest official MTU specified is for AX.25 packet radio at 256 octets. PR: kern/146628 Tested by: Matthew Luckie <mjl-at-luckie org nz> MFC after: 1 week Notes: svn path=/head/; revision=211333
* When using TSO and sending more than TCP_MAXWIN sendalot is setAndre Oppermann2010-08-141-2/+5
| | | | | | | | | | | | | | | | | | and we loop back to 'again'. If the remainder is less or equal to one full segment, the TSO flag was not cleared even though it isn't necessary anymore. Enabling the TSO flag on a segment that doesn't require any offloaded segmentation by the NIC may cause confusion in the driver or hardware. Reset the internal tso flag in tcp_output() on every iteration of sendalot. PR: kern/132832 Submitted by: Renaud Lienhart <renaud-at-vmware com> MFC after: 1 week Notes: svn path=/head/; revision=211317
* MFP4: @176978-176982, 176984, 176990-176994, 177441Bjoern A. Zeeb2010-04-291-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days Notes: svn path=/head/; revision=207369
* Don't clear other flags (e.g. CSUM_TCP) when setting CSUM_TSO. This wasKenneth D. Merry2010-04-191-1/+1
| | | | | | | | | | causing TSO to break for the Xen netfront driver. Reviewed by: gibbs, rwatson MFC after: 7 days Notes: svn path=/head/; revision=206844
* Several years ago a feature was added to TCP that casued soreceive() toJohn Baldwin2009-11-061-1/+1
| | | | | | | | | | | | | | | | send an ACK right away if data was drained from a TCP socket that had previously advertised a zero-sized window. The current code requires the receive window to be exactly zero for this to kick in. If window scaling is enabled and the window is smaller than the scale, then the effective window that is advertised is zero. However, in that case the zero-sized window handling is not enabled because the window is not exactly zero. The fix changes the code to check the raw window value against zero. Reviewed by: bz MFC after: 1 week Notes: svn path=/head/; revision=198990
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andRobert Watson2009-08-011-1/+1
| | | | | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket) Notes: svn path=/head/; revision=196019
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorRobert Watson2009-07-141-39/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith) Notes: svn path=/head/; revision=195699
* Trim extra sets of ()'s.John Baldwin2009-06-161-1/+1
| | | | | | | Requested by: bde Notes: svn path=/head/; revision=194305
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICRobert Watson2009-06-051-1/+0
| | | | | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd Notes: svn path=/head/; revision=193511
* Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() andRobert Watson2009-04-111-16/+16
| | | | | | | | | | | TCPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days Notes: svn path=/head/; revision=190948
* Rather than using hidden includes (with cicular dependencies),Bjoern A. Zeeb2008-12-021-0/+2
| | | | | | | | | | | | | | directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=185571
* Replace most INP_CHECK_SOCKAF() uses checking if it is anBjoern A. Zeeb2008-11-271-1/+1
| | | | | | | | | | | | IPv6 socket by comparing a constant inp vflag. This is expected to help to reduce extra locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks Notes: svn path=/head/; revision=185371
* Change the initialization methodology for global variables scheduledMarko Zec2008-11-191-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation Notes: svn path=/head/; revision=185088
* Step 1.5 of importing the network stack virtualization infrastructureMarko Zec2008-10-021-16/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*). (*) netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation Notes: svn path=/head/; revision=183550
* Implement IPv6 support for TCP MD5 Signature Option (RFC 2385)Bjoern A. Zeeb2008-09-131-8/+1
| | | | | | | | | | | the same way it has been implemented for IPv4. Reviewed by: bms (skimmed) Tested by: Nick Hilliard (nick netability.ie) (with more changes) MFC after: 2 months Notes: svn path=/head/; revision=183001
* Add a second KASSERT checking for len >= 0 in the tcp output path.Bjoern A. Zeeb2008-09-071-1/+7
| | | | | | | | | | | | This is different to the first one (as len gets updated between those two) and would have caught various edge cases (read bugs) at a well defined place I had been debugging the last months instead of triggering (random) panics further down the call graph. MFC after: 2 months Notes: svn path=/head/; revision=182841
* Commit step 1 of the vimage project, (network stack)Bjoern A. Zeeb2008-08-171-27/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch Notes: svn path=/head/; revision=181803
* MFp4 (//depot/projects/tcpecn/):Rui Paulo2008-07-311-0/+43
| | | | | | | | | | | TCP ECN support. Merge of my GSoC 2006 work for NetBSD. TCP ECN is defined in RFC 3168. Partly reviewed by: dwmalone, silby Obtained from: NetBSD Notes: svn path=/head/; revision=181056
* Fix commment in typo.Rui Paulo2008-07-151-1/+1
| | | | | | | M tcp_output.c Notes: svn path=/head/; revision=180535
* Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros toRobert Watson2008-04-171-1/+1
| | | | | | | | | | | | | | | | | | explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch) Notes: svn path=/head/; revision=178285
* Remove TCP options ordering assumptions in tcp_addoptions(). OrderingAndre Oppermann2008-04-071-1/+11
| | | | | | | | | | | | was changed in rev. 1.161 of tcp_var.h. All option now test for sufficient space in TCP header before getting added. Reported by: Mark Atkinson <atkin901-at-yahoo.com> Tested by: Mark Atkinson <atkin901-at-yahoo.com> MFC after: 1 week Notes: svn path=/head/; revision=177988
* Remove now unnecessary comment.Andre Oppermann2008-04-071-2/+0
| | | | Notes: svn path=/head/; revision=177987
* Use #defines for TCP options padding after EOL to be consistent.Andre Oppermann2008-04-071-2/+2
| | | | | | | Reviewed by: bz Notes: svn path=/head/; revision=177986
* Padding after EOL option must be zeros according to RFC793 butBjoern A. Zeeb2008-03-091-2/+10
| | | | | | | | | | | | | | the NOPs used are 0x01. While we could simply pad with EOLs (which are 0x00), rather use an explicit 0x00 constant there to not confuse poeple with 'EOL padding'. Put in a comment saying just that. Problem discussed on: src-committers with andre, silby, dwhite as follow up to the rev. 1.161 commit of tcp_var.h. MFC after: 11 days Notes: svn path=/head/; revision=176978
* Centralize and correct computation of TCP-MD5 signature offset withinBjoern A. Zeeb2007-11-301-8/+3
| | | | | | | | | | | the packet (tcp header options field). Reviewed by: tools/regression/netinet/tcpconnect MFC after: 3 days Tested by: Nick Hilliard (see net@) Notes: svn path=/head/; revision=174120
* Let opt be an array. Though &opt[0] == opt == &opt, &opt is highlyBjoern A. Zeeb2007-11-281-1/+1
| | | | | | | | | | | confusing and hard to understand so change it to just opt and remove the extra cast no longer/not needed. Discussed with: rwatson MFC after: 3 days Notes: svn path=/head/; revision=174023
* Make TSO work with IPSEC compiled into the kernel.Bjoern A. Zeeb2007-11-211-3/+16
| | | | | | | | | | | | | | The lookup hurts a bit for connections but had been there anyway if IPSEC was compiled in. So moving the lookup up a bit gives us TSO support at not extra cost. PR: kern/115586 Tested by: gallatin Discussed with: kmacy MFC after: 2 months Notes: svn path=/head/; revision=173835
* Merge first in a series of TrustedBSD MAC Framework KPI changesRobert Watson2007-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer Notes: svn path=/head/; revision=172930
* Add FBSDID to all files in netinet so that people can moreMike Silbersack2007-10-071-1/+3
| | | | | | | | | easily include file version information in bug reports. Approved by: re (kensmith) Notes: svn path=/head/; revision=172467
* Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSECGeorge V. Neville-Neil2007-07-031-3/+3
| | | | | | | | | | | option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing Notes: svn path=/head/; revision=171167
* Commit IPv6 support for FAST_IPSEC to the tree.George V. Neville-Neil2007-07-011-6/+1
| | | | | | | | | | | | This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing Notes: svn path=/head/; revision=171133
* Make the handling of the tcp window explicit for the SYN_SENT caseAndre Oppermann2007-06-091-4/+10
| | | | | | | | | | | | in tcp_outout(). This is currently not strictly necessary but paves the way to simplify the entire SYN options handling quite a bit. Clarify comment. No change in effective behavour with this commit. RFC1323 requires the window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself never be scaled. Notes: svn path=/head/; revision=170470
* Don't send pure window updates when the peer has closed the connectionAndre Oppermann2007-06-091-1/+4
| | | | | | | and won't ever send more data. Notes: svn path=/head/; revision=170467
* Fix statistical accounting for bytes and packets during sack retransmits.John Baldwin2007-05-181-1/+1
| | | | | | | | MFC after: 1 week Submitted by: mohans Notes: svn path=/head/; revision=169682
* Fix an incorrect replace of a timer reference made during the TCP timerAndre Oppermann2007-05-101-1/+1
| | | | | | | | rewrite in rev. 1.132. This unmasked yet another bug that causes certain connections to get indefinately stuck in LAST_ACK state. Notes: svn path=/head/; revision=169457
* Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead ofAndre Oppermann2007-05-061-4/+6
| | | | | | | a decdicated sack_enable int for this bool. Change all users accordingly. Notes: svn path=/head/; revision=169317
* o Remove unused and redundant TCP option definitionsAndre Oppermann2007-04-201-4/+4
| | | | | | | | o Replace usage of MAX_TCPOPTLEN with the correctly constructed and derived MAX_TCPOPTLEN Notes: svn path=/head/; revision=168904
* Change the TCP timer system from using the callout system five timesAndre Oppermann2007-04-111-28/+25
| | | | | | | | | | | | | | | | | | | directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version) Notes: svn path=/head/; revision=168615
* Retire unused TCP_SACK_DEBUG.Andre Oppermann2007-04-041-1/+0
| | | | Notes: svn path=/head/; revision=168364
* ANSIfy function declarations and remove register keywords for variables.Andre Oppermann2007-03-211-2/+1
| | | | | | | Consistently apply style to all function declarations. Notes: svn path=/head/; revision=167785
* Subtract optlen in the maximum length check for TSO and finally avoidAndre Oppermann2007-03-211-1/+1
| | | | | | | | | slightly oversized TSO mbuf chains. Submitted by: kmacy Notes: svn path=/head/; revision=167780
* Match up SYSCTL_INT declarations in style.Andre Oppermann2007-03-191-2/+2
| | | | Notes: svn path=/head/; revision=167718
* Maintain a pointer and offset pair into the socket buffer mbuf chain toAndre Oppermann2007-03-191-3/+13
| | | | | | | | | | | | avoid traversal of the entire socket buffer for larger offsets on stream sockets. Adjust tcp_output() make use of it. Tested by: gallatin Notes: svn path=/head/; revision=167715
* Consolidate insertion of TCP options into a segment from within tcp_output()Andre Oppermann2007-03-151-145/+198
| | | | | | | | | | | | | | | | | and syncache_respond() into its own generic function tcp_addoptions(). tcp_addoptions() is alignment agnostic and does optimal packing in all cases. In struct tcpopt rename to_requested_s_scale to just to_wscale. Add a comment with quote from RFC1323: "The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself is never scaled." Reviewed by: silby, mohans, julian Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=167606
* Prevent TSO mbuf chain from overflowing a few bytes by subtracting theAndre Oppermann2007-03-011-2/+3
| | | | | | | | | TCP options size before the TSO total length calculation. Bug found by: kmacy Notes: svn path=/head/; revision=167139
* Add EHOSTDOWN and ENETUNREACH to the list of soft errors, that shouldn'tGleb Smirnoff2007-02-281-0/+2
| | | | | | | | | | | be returned up to the caller. PR: 100172 Submitted by: "Andrew - Supernews" <andrew supernews.net> Reviewed by: rwatson, bms Notes: svn path=/head/; revision=167107
* Toss the code, that handles errors from ip_output(), to make it moreGleb Smirnoff2007-02-281-30/+26
| | | | | | | | | | | readable: - Merge two embedded if() into one. - Introduce switch() block to handle different kinds of errors. Reviewed by: rwatson, bms Notes: svn path=/head/; revision=167106
* Auto sizing TCP socket buffers.Andre Oppermann2007-02-011-4/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally the socket buffers are static (either derived from global defaults or set with setsockopt) and do not adapt to real network conditions. Two things happen: a) your socket buffers are too small and you can't reach the full potential of the network between both hosts; b) your socket buffers are too big and you waste a lot of kernel memory for data just sitting around. With automatic TCP send and receive socket buffers we can start with a small buffer and quickly grow it in parallel with the TCP congestion window to match real network conditions. FreeBSD has a default 32K send socket buffer. This supports a maximal transfer rate of only slightly more than 2Mbit/s on a 100ms RTT trans-continental link. Or at 200ms just above 1Mbit/s. With TCP send buffer auto scaling and the default values below it supports 20Mbit/s at 100ms and 10Mbit/s at 200ms. That's an improvement of factor 10, or 1000%. For the receive side it looks slightly better with a default of 64K buffer size. New sysctls are: net.inet.tcp.sendbuf_auto=1 (enabled) net.inet.tcp.sendbuf_inc=8192 (8K, step size) net.inet.tcp.sendbuf_max=262144 (256K, growth limit) net.inet.tcp.recvbuf_auto=1 (enabled) net.inet.tcp.recvbuf_inc=16384 (16K, step size) net.inet.tcp.recvbuf_max=262144 (256K, growth limit) Tested by: many (on HEAD and RELENG_6) Approved by: re MFC after: 1 month Notes: svn path=/head/; revision=166405
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hRobert Watson2006-10-221-1/+2
| | | | | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA Notes: svn path=/head/; revision=163606
* When tcp_output() receives an error upon sending a packet it reverts partsAndre Oppermann2006-09-281-2/+15
| | | | | | | | | | | | | | | | | | | of its internal state to ignore the failed send and try again a bit later. If the error is EPERM the packet got blocked by the local firewall and the revert may cause the session to get stuck and retry indefinitely. This way we treat it like a packet loss and let the retransmit timer and timeouts do their work over time. The correct behavior is to drop a connection that gets an EPERM error. However this _may_ introduce some POLA problems and a two commit approach was chosen. Discussed with: glebius PR: kern/25986 PR: kern/102653 Notes: svn path=/head/; revision=162739