summaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_timer.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Move TIME_WAIT related functions and timer handling from filesAndre Oppermann2007-05-161-54/+1
| | | | | | | | | | | | | | | | | | | other than repo copied tcp_subr.c into tcp_timewait.c#1.284: tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck() tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset() tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop() tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan() This is a mechanical move with appropriate renames and making them static if used only locally. The tcp_tw_2msl_scan() cleanup function is still run from the tcp_slowtimo() in tcp_timer.c. Notes: svn path=/head/; revision=169608
* Move universally to ANSI C function declarations, with relativelyRobert Watson2007-05-101-3/+1
| | | | | | | consistent style(9)-ish layout. Notes: svn path=/head/; revision=169454
* Fix two comments.Andre Oppermann2007-05-061-2/+2
| | | | Notes: svn path=/head/; revision=169309
* Change the TCP timer system from using the callout system five timesAndre Oppermann2007-04-111-175/+299
| | | | | | | | | | | | | | | | | | | directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version) Notes: svn path=/head/; revision=168615
* Retire unused TCP_SACK_DEBUG.Andre Oppermann2007-04-041-1/+0
| | | | Notes: svn path=/head/; revision=168364
* ANSIfy function declarations and remove register keywords for variables.Andre Oppermann2007-03-211-10/+5
| | | | | | | Consistently apply style to all function declarations. Notes: svn path=/head/; revision=167785
* Match up SYSCTL declaration style.Andre Oppermann2007-03-191-6/+9
| | | | Notes: svn path=/head/; revision=167721
* Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigateMohan Srinivasan2007-02-261-6/+26
| | | | | | | | | | | potential issues where the peer does not close, potentially leaving thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl fast_finwait2_recycle, which is disabled by default. Reviewed by: gnn, silby. Notes: svn path=/head/; revision=167036
* Back when we had T/TCP support, we used to apply differentRuslan Ermilov2006-09-071-43/+20
| | | | | | | | | | | | | | timeouts for TCP and T/TCP connections in the TIME_WAIT state, and we had two separate timed wait queues for them. Now that is has gone, the timeout is always 2*MSL again, and there is no reason to keep two queues (the first was unused anyway!). Also, reimplement the remaining queue using a TAILQ (it was technically impossible before, with two queues). Notes: svn path=/head/; revision=162111
* Remove a microoptimization for i386 that was a micropessimization for amd64.Ruslan Ermilov2006-09-071-2/+1
| | | | Notes: svn path=/head/; revision=162108
* o Backout rev. 1.125 of in_pcb.c. It appeared to behave extremelyGleb Smirnoff2006-09-061-7/+12
| | | | | | | | | | | | | | | | | | | | | | | bad under high load. For example with 40k sockets and 25k tcptw entries, connect() syscall can run for seconds. Debugging showed that it iterates the cycle millions times and purges thousands of tcptw entries at a time. Besides practical unusability this change is architecturally wrong. First, in_pcblookup_local() is used in connect() and bind() syscalls. No stale entries purging shouldn't be done here. Second, it is a layering violation. o Return back the tcptw purging cycle to tcp_timer_2msl_tw(), that was removed in rev. 1.78 by rwatson. The commit log of this revision tells nothing about the reason cycle was removed. Now we need this cycle, since major cleaner of stale tcptw structures is removed. o Disable probably necessary, but now unused tcp_twrecycleable() function. Reviewed by: ru Notes: svn path=/head/; revision=162064
* Fixes an edge case bug in timewait handling where ticks rolling over causingMohan Srinivasan2006-08-111-4/+3
| | | | | | | | the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry). Reviewed by: silby Notes: svn path=/head/; revision=161226
* When entering a timer on a tcpcb, don't continue processing if it has beenRobert Watson2006-06-031-9/+14
| | | | | | | | | | | | | | | dropped. This prevents a bug introduced during the socket/pcb refcounting work from occuring, in which occasionally the retransmit timer may fire after a connection has been reset, resulting in the resulting R|A TCP packet having a source port of 0, as the port reservation has been released. While here, fixing up some RUNLOCK->WUNLOCK bugs. MFC after: 1 month Notes: svn path=/head/; revision=159199
* - Backout one line from 1.78. The tp can be freed by tcp_drop().Gleb Smirnoff2006-05-161-3/+2
| | | | | | | | | - Style next line. Coverity ID: 912 Notes: svn path=/head/; revision=158644
* Only return (tw) from tcp_twclose() if reuse is passed, otherwiseRobert Watson2006-05-051-1/+1
| | | | | | | | | | | | | return NULL. In principle this shouldn't change the behavior, but avoids returning a potentially invalid/inappropriate pointer to the caller. Found with: Coverity Prevent (tm) Submitted by: pjd MFC after: 3 months Notes: svn path=/head/; revision=158304
* Update TCP for infrastructural changes to the socket/pcb refcount model,Robert Watson2006-04-011-16/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, the receive code no longer requires the pcbinfo lock, and the send code only requires it if building a new connection on an otherwise unconnected socket triggered via sendto() with an address. This should significnatly reduce tcbinfo lock contention in the receive and send cases. - In order to support the invariant that so_pcb != NULL, it is now necessary for the TCP code to not discard the tcpcb any time a connection is dropped, but instead leave the tcpcb until the socket is shutdown. This case is handled by setting INP_DROPPED, to substitute for using a NULL so_pcb to indicate that the connection has been dropped. This requires the inpcb lock, but not the pcbinfo lock. - Unlike all other protocols in the tree, TCP may need to retain access to the socket after the file descriptor has been closed. Set SS_PROTOREF in tcp_detach() in order to prevent the socket from being freed, and add a flag, INP_SOCKREF, so that the TCP code knows whether or not it needs to free the socket when the connection finally does close. The typical case where this occurs is if close() is called on a TCP socket before all sent data in the send socket buffer has been transmitted or acknowledged. If INP_SOCKREF is found when the connection is dropped, we release the inpcb, tcpcb, and socket instead of flagging INP_DROPPED. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Annotate the existence of a long-standing race in the TCP timer code, in which timers are stopped but not drained when the socket is freed, as waiting for drain may lead to deadlocks, or have to occur in a context where waiting is not permitted. This race has been handled by testing to see if the tcpcb pointer in the inpcb is NULL (and vice versa), which is not normally permitted, but may be true of a inpcb and tcpcb have been freed. Add a counter to test how often this race has actually occurred, and a large comment for each instance where we compare potentially freed memory with NULL. This will have to be fixed in the near future, but requires is to further address how to handle the timer shutdown shutdown issue. - Several TCP calls no longer potentially free the passed inpcb/tcpcb, so no longer need to return a pointer to indicate whether the argument passed in is still valid. - Un-macroize debugging and locking setup for various protocol switch methods for TCP, as it lead to more obscurity, and as locking becomes more customized to the methods, offers less benefit. - Assert copyright on tcp_usrreq.c due to significant modifications that have been made as part of this work. These changes significantly modify the memory management and connection logic of our TCP implementation, and are (as such) High Risk Changes, and likely to contain serious bugs. Please report problems to the current@ mailing list ASAP, ideally with simple test cases, and optionally, packet traces. MFC after: 3 months Notes: svn path=/head/; revision=157376
* Explicitly assert socket pointer is non-NULL in tcp_input() so as toRobert Watson2006-03-261-8/+8
| | | | | | | | | | | | provide better debugging information. Prefer explicit comparison to NULL for tcpcb pointers rather than treating them as booleans. MFC after: 1 month Notes: svn path=/head/; revision=157136
* Make sysctl_msec_to_ticks(SYSCTL_HANDLER_ARGS) generally available insteadAndre Oppermann2006-02-161-20/+0
| | | | | | | | | | of being private to tcp_timer.c. Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days Notes: svn path=/head/; revision=155758
* Remove no-op spl's and most comment references to spls, as TCP lockingRobert Watson2005-07-191-16/+0
| | | | | | | | | is believed to be basically done (modulo any remaining bugs). MFC after: 3 days Notes: svn path=/head/; revision=148156
* Replace t_force with a t_flag (TF_FORCEDATA).Paul Saab2005-05-211-2/+2
| | | | | | | | Submitted by: Raja Mukerji. Reviewed by: Mohan, Silby, Andre Opperman. Notes: svn path=/head/; revision=146463
* /* -> /*- for license, minor formatting changesWarner Losh2005-01-071-1/+1
| | | | Notes: svn path=/head/; revision=139823
* Remove the now unused tcp_canceltimers() function. tcpcb timers areRobert Watson2004-12-231-15/+0
| | | | | | | | | now stopped as part of tcp_discardcb(). MFC after: 2 weeks Notes: svn path=/head/; revision=139220
* Remove an annotation of a minor race relating to the update ofRobert Watson2004-12-231-7/+0
| | | | | | | | | | | | multiple MIB entries using sysctl in short order, which might result in unexpected values for tcp_maxidle being generated by tcp_slowtimo. In practice, this will not happen, or at least, doesn't require an explicit comment. MFC after: 2 weeks Notes: svn path=/head/; revision=139219
* Assert the tcptw inpcb lock in tcp_timer_2msl_reset(), as fields inRobert Watson2004-12-051-0/+1
| | | | | | | | | the tcptw undergo non-atomic read-modify-writes. MFC after: 2 weeks Notes: svn path=/head/; revision=138416
* tcp_timewait() performs multiple non-atomic reads on the tcptwRobert Watson2004-11-231-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | structure, so assert the inpcb lock associated with the tcptw. Also assert the tcbinfo lock, as tcp_timewait() may call tcp_twclose() or tcp_2msl_rest(), which require it. Since tcp_timewait() is already called with that lock from tcp_input(), this doesn't change current locking, merely documents reasons for it. In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest() is called, which requires that lock. In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop() is called, which requires that lock. Document the locking strategy for the time wait queues in tcp_timer.c, which consists of protecting the time wait queues in the same manner as the tcbinfo structure (using the tcbinfo lock). In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait queues are modified. In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait queues may be modified. In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait queues may be modified. MFC after: 2 weeks Notes: svn path=/head/; revision=138025
* De-spl tcp_slowtimo; tcp_maxidle assignment is subject to possibleRobert Watson2004-11-231-15/+11
| | | | | | | | | | | | | | | | | | but unlikely races that could be corrected by having tcp_keepcnt and tcp_keepintvl modifications go through handler functions via sysctl, but probably is not worth doing. Updates to multiple sysctls within evaluation of a single addition are unlikely. Annotate that tcp_canceltimers() is currently unused. De-spl tcp_timer_delack(). De-spl tcp_timer_2msl(). MFC after: 2 weeks Notes: svn path=/head/; revision=138024
* Remove RFC1644 T/TCP support from the TCP side of the network stack.Andre Oppermann2004-11-021-2/+2
| | | | | | | | | | | | | | | | | | | A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch Notes: svn path=/head/; revision=137139
* White space cleanup for netinet before branch:Robert Watson2004-08-161-10/+10
| | | | | | | | | | | | | | - Trailing tab/space cleanup - Remove spurious spaces between or before tabs This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET. Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net> Notes: svn path=/head/; revision=133874
* Add support for TCP Selective Acknowledgements. The work for thisPaul Saab2004-06-231-0/+3
| | | | | | | | | | | | | | | | | | originated on RELENG_4 and was ported to -CURRENT. The scoreboarding code was obtained from OpenBSD, and many of the remaining changes were inspired by OpenBSD, but not taken directly from there. You can enable/disable sack using net.inet.tcp.do_sack. You can also limit the number of sack holes that all senders can have in the scoreboard with net.inet.tcp.sackhole_limit. Reviewed by: gnn Obtained from: Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan) Notes: svn path=/head/; revision=130989
* Remove advertising clause from University of California Regent'sWarner Losh2004-04-071-4/+0
| | | | | | | | | | license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson Notes: svn path=/head/; revision=128019
* Introduce tcp_hostcache and remove the tcp specific metrics fromAndre Oppermann2003-11-201-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl) Notes: svn path=/head/; revision=122922
* use local values instead of chasing pointersSam Leffler2003-11-081-3/+2
| | | | | | | Supported by: FreeBSD Foundation Notes: svn path=/head/; revision=122326
* Unify the "send high" and "recover" variables as specified in theJeffrey Hsu2003-07-151-2/+7
| | | | | | | | | | | | | | | lastest rev of the spec. Use an explicit flag for Fast Recovery. [1] Fix bug with exiting Fast Recovery on a retransmit timeout diagnosed by Lu Guohan. [2] Reviewed by: Thomas Henderson <thomas.r.henderson@boeing.com> Reported and tested by: Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2] Approved by: Thomas Henderson <thomas.r.henderson@boeing.com>, Sally Floyd <floyd@acm.org> [1] Notes: svn path=/head/; revision=117650
* Compensate for decreasing the minimum retransmit timeout.Jeffrey Hsu2003-06-041-2/+2
| | | | | | | Reviewed by: jlemon Notes: svn path=/head/; revision=115824
* Remove a panic(); if the zone allocator can't provide more timewaitJonathan Lemon2003-03-081-22/+61
| | | | | | | | | | structures, reuse the oldest one. Also move the expiry timer from a per-structure callout to the tcp slow timer. Sponsored by: DARPA, NAI Labs Notes: svn path=/head/; revision=112009
* Add a TCP TIMEWAIT state which uses less space than a fullblown TCPJonathan Lemon2003-02-191-0/+27
| | | | | | | | | | | control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs Notes: svn path=/head/; revision=111145
* Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so theJonathan Lemon2003-02-191-1/+1
| | | | | | | | | | | routine does not require a tcpcb to operate. Since we no longer keep template mbufs around, move pseudo checksum out of this routine, and merge it with the length update. Sponsored by: DARPA, NAI Labs Notes: svn path=/head/; revision=111144
* Fix NewReno.Jeffrey Hsu2003-01-131-5/+2
| | | | | | | Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com> Notes: svn path=/head/; revision=109175
* Validate inp to prevent an use after free.Jeffrey Hsu2002-12-241-0/+25
| | | | Notes: svn path=/head/; revision=108265
* Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of dependingBruce Evans2002-09-051-8/+6
| | | | | | | | | on namespace pollution 4 layers deep in <netinet/in_pcb.h>. Removed unused includes. Sorted includes. Notes: svn path=/head/; revision=102967
* Fix overflows in intermediate calculations in sysctl_msec_to_ticks().John Polstra2002-07-201-2/+2
| | | | | | | | | | At hz values of 1000 and above the overflows caused net.inet.tcp.keepidle to be reported as negative. MFC after: 3 days Notes: svn path=/head/; revision=100420
* Introduce two new sysctl's:Matthew Dillon2002-07-181-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net.inet.tcp.rexmit_min (default 3 ticks equiv) This sysctl is the retransmit timer RTO minimum, specified in milliseconds. This value is designed for algorithmic stability only. net.inet.tcp.rexmit_slop (default 200ms) This sysctl is the retransmit timer RTO slop which is added to every retransmit timeout and is designed to handle protocol stack overheads and delayed ack issues. Note that the *original* code applied a 1-second RTO minimum but never applied real slop to the RTO calculation, so any RTO calculation over one second would have no slop and thus not account for protocol stack overheads (TCP timestamps are not a measure of protocol turnaround!). Essentially, the original code made the RTO calculation almost completely irrelevant. Please note that the 200ms slop is debateable. This commit is not meant to be a line in the sand, and if the community winds up deciding that increasing it is the correct solution then it's easy to do. Note that larger values will destroy performance on lossy networks while smaller values may result in a greater number of unnecessary retransmits. Notes: svn path=/head/; revision=100335
* Lock up inpcb.Jeffrey Hsu2002-06-101-0/+51
| | | | | | | Submitted by: Jennifer Yang <yangjihui@yahoo.com> Notes: svn path=/head/; revision=98102
* Back out my lats commit of locking down a socket, it conflicts with hsu's work.Seigo Tanimura2002-05-311-51/+17
| | | | | | | Requested by: hsu Notes: svn path=/head/; revision=97658
* Lock down a socket, milestone 1.Seigo Tanimura2002-05-201-17/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred Notes: svn path=/head/; revision=96972
* o Our currenty userland boot code (due to rc.conf and rc.network) alwaysRobert Watson2001-12-071-1/+1
| | | | | | | | enables TCP keepalives using the net.inet.tcp.always_keepalive by default. Synchronize the kernel default with the userland default. Notes: svn path=/head/; revision=87499
* Much delayed but now present: RFC 1948 style sequence numbersMike Silbersack2001-08-221-2/+0
| | | | | | | | | | | | | In order to ensure security and functionality, RFC 1948 style initial sequence number generation has been implemented. Barring any major crypographic breakthroughs, this algorithm should be unbreakable. In addition, the problems with TIME_WAIT recycling which affect our currently used algorithm are not present. Reviewed by: jesper Notes: svn path=/head/; revision=82122
* Temporary feature: Runtime tuneable tcp initial sequence numberMike Silbersack2001-07-081-0/+2
| | | | | | | | | | | | | | | | | | | | | generation scheme. Users may now select between the currently used OpenBSD algorithm and the older random positive increment method. While the OpenBSD algorithm is more secure, it also breaks TIME_WAIT handling; this is causing trouble for an increasing number of folks. To switch between generation schemes, one sets the sysctl net.inet.tcp.tcp_seq_genscheme. 0 = random positive increments, 1 = the OpenBSD algorithm. 1 is still the default. Once a secure _and_ compatible algorithm is implemented, this sysctl will be removed. Reviewed by: jlemon Tested by: numerous subscribers of -net Notes: svn path=/head/; revision=79413
* Eliminate the allocation of a tcp template structure for eachMike Silbersack2001-06-231-3/+9
| | | | | | | | | | | | | | | connection. The information contained in a tcptemp can be reconstructed from a tcpcb when needed. Previously, tcp templates required the allocation of one mbuf per connection. On large systems, this change should free up a large number of mbufs. Reviewed by: bmilekic, jlemon, ru MFC after: 2 weeks Notes: svn path=/head/; revision=78642
* Disable rfc1323 and rfc1644 TCP extensions if we havn't gotJesper Skriver2001-05-311-0/+9
| | | | | | | | | | | | | | | any response to our third SYN to work-around some broken terminal servers (most of which have hopefully been retired) that have bad VJ header compression code which trashes TCP segments containing unknown-to-them TCP options. PR: kern/1689 Submitted by: jesper Reviewed by: wollman MFC after: 2 weeks Notes: svn path=/head/; revision=77539