summaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_var.h
Commit message (Collapse)AuthorAgeFilesLines
* Fix cast-qualifiers warning when INET6 is not presentPeter Wemm2007-07-051-1/+1
| | | | | | | Approved by: re (rwatson) Notes: svn path=/head/; revision=171229
* Refactor and rewrite in parts the SYN handling code on listen socketsAndre Oppermann2007-05-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | in tcp_input(): o tighten the checks on allowed TCP flags to be RFC793 and tcp-secure conform o log check failures to syslog at LOG_DEBUG level o rearrange the code flow to be easier to follow o add KASSERTs to validate assumptions of the code flow Add sysctl net.inet.tcp.syncache.rst_on_sock_fail defaulting to enable that controls the behavior on socket creation failure for a otherwise successful 3-way handshake. The socket creation can fail due to global memory shortage, listen queue limits and file descriptor limits. The sysctl allows to chose between two options to deal with this. One is to send a reset to the other endpoint to notify it about the failure (default). The other one is to ignore and treat the failure as a transient error and have the other endpoint retransmit for another try. Reviewed by: rwatson (in general) Notes: svn path=/head/; revision=170055
* Add tcp_log_addrs() function to generate and standardized TCP log lineAndre Oppermann2007-05-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for use thoughout the tcp subsystem. It is IPv4 and IPv6 aware creates a line in the following format: "TCP: [1.2.3.4]:50332 to [1.2.3.4]:80 tcpflags <RST>" A "\n" is not included at the end. The caller is supposed to add further information after the standard tcp log header. The function returns a NUL terminated string which the caller has to free(s, M_TCPLOG) after use. All memory allocation is done with M_NOWAIT and the return value may be NULL in memory shortage situations. Either struct in_conninfo || (struct tcphdr && (struct ip || struct ip6_hdr) have to be supplied. Due to ip[6].h header inclusion limitations and ordering issues the struct ip and struct ip6_hdr parameters have to be casted and passed as void * pointers. tcp_log_addrs(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr, void *ip6hdr) Usage example: struct ip *ip; char *tcplog; if (tcplog = tcp_log_addrs(NULL, th, (void *)ip, NULL)) { log(LOG_DEBUG, "%s; %s: Connection attempt to closed port\n", tcplog, __func__); free(s, M_TCPLOG); } Notes: svn path=/head/; revision=169683
* Move TIME_WAIT related functions and timer handling from filesAndre Oppermann2007-05-161-0/+2
| | | | | | | | | | | | | | | | | | | other than repo copied tcp_subr.c into tcp_timewait.c#1.284: tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck() tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset() tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop() tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan() This is a mechanical move with appropriate renames and making them static if used only locally. The tcp_tw_2msl_scan() cleanup function is still run from the tcp_slowtimo() in tcp_timer.c. Notes: svn path=/head/; revision=169608
* Complete the (mechanical) move of the TCP reassembly and timewaitAndre Oppermann2007-05-131-0/+15
| | | | | | | | | | functions from their origininal place to their own files. TCP Reassembly from tcp_input.c -> tcp_reass.c TCP Timewait from tcp_subr.c -> tcp_timewait.c Notes: svn path=/head/; revision=169541
* Add the timestamp offset to struct tcptw so we can generate properAndre Oppermann2007-05-111-0/+1
| | | | | | | | ACKs in TIME_WAIT state that don't get dropped by the PAWS check on the receiver. Notes: svn path=/head/; revision=169477
* Remove unused requested_s_scale from struct tcpcb.Andre Oppermann2007-05-061-1/+0
| | | | Notes: svn path=/head/; revision=169318
* Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead ofAndre Oppermann2007-05-061-1/+0
| | | | | | | a decdicated sack_enable int for this bool. Change all users accordingly. Notes: svn path=/head/; revision=169317
* Add global mutex tcp_debug_mtx, which will protect global TCP debuggingRobert Watson2007-05-041-1/+1
| | | | | | | | | | | | | | state tcp_debug, tcp_debx. Acquire and drop as required in tcp_trace(). Move to ANSI C function header, correct prototype types so that short TCP state is no longer promoted to int unnecessarily. Add comments. MFC after: 3 weeks Notes: svn path=/head/; revision=169272
* o Remove unncessary TOF_SIGLEN flag from struct tcpoptAndre Oppermann2007-04-201-6/+5
| | | | | | | | o Correctly set to->to_signature in tcp_dooptions() o Update comments Notes: svn path=/head/; revision=168906
* Remove bogus check for accept queue length and associated failure handlingAndre Oppermann2007-04-201-1/+1
| | | | | | | | | | | | | | | | | from the incoming SYN handling section of tcp_input(). Enforcement of the accept queue limits is done by sonewconn() after the 3WHS is completed. It is not necessary to have an earlier check before a connection request enters the SYN cache awaiting the full handshake. It rather limits the effectiveness of the syncache by preventing legit and illegit connections from entering it and having them shaken out before we hit the real limit which may have vanished by then. Change return value of syncache_add() to void. No status communication is required. Notes: svn path=/head/; revision=168903
* Change the TCP timer system from using the callout system five timesAndre Oppermann2007-04-111-7/+3
| | | | | | | | | | | | | | | | | | | directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version) Notes: svn path=/head/; revision=168615
* Remove tcp_minmssoverload DoS detection logic. The problem it tried toAndre Oppermann2007-03-211-5/+0
| | | | | | | | | protect us from wasn't really there and it only bloats the code. Should the problem surface in the future we can simply resurrect it from cvs history. Notes: svn path=/head/; revision=167772
* Consolidate insertion of TCP options into a segment from within tcp_output()Andre Oppermann2007-03-151-8/+14
| | | | | | | | | | | | | | | | | and syncache_respond() into its own generic function tcp_addoptions(). tcp_addoptions() is alignment agnostic and does optimal packing in all cases. In struct tcpopt rename to_requested_s_scale to just to_wscale. Add a comment with quote from RFC1323: "The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself is never scaled." Reviewed by: silby, mohans, julian Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=167606
* Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigateMohan Srinivasan2007-02-261-0/+3
| | | | | | | | | | | potential issues where the peer does not close, potentially leaving thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl fast_finwait2_recycle, which is disabled by default. Reviewed by: gnn, silby. Notes: svn path=/head/; revision=167036
* Auto sizing TCP socket buffers.Andre Oppermann2007-02-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally the socket buffers are static (either derived from global defaults or set with setsockopt) and do not adapt to real network conditions. Two things happen: a) your socket buffers are too small and you can't reach the full potential of the network between both hosts; b) your socket buffers are too big and you waste a lot of kernel memory for data just sitting around. With automatic TCP send and receive socket buffers we can start with a small buffer and quickly grow it in parallel with the TCP congestion window to match real network conditions. FreeBSD has a default 32K send socket buffer. This supports a maximal transfer rate of only slightly more than 2Mbit/s on a 100ms RTT trans-continental link. Or at 200ms just above 1Mbit/s. With TCP send buffer auto scaling and the default values below it supports 20Mbit/s at 100ms and 10Mbit/s at 200ms. That's an improvement of factor 10, or 1000%. For the receive side it looks slightly better with a default of 64K buffer size. New sysctls are: net.inet.tcp.sendbuf_auto=1 (enabled) net.inet.tcp.sendbuf_inc=8192 (8K, step size) net.inet.tcp.sendbuf_max=262144 (256K, growth limit) net.inet.tcp.recvbuf_auto=1 (enabled) net.inet.tcp.recvbuf_inc=16384 (16K, step size) net.inet.tcp.recvbuf_max=262144 (256K, growth limit) Tested by: many (on HEAD and RELENG_6) Approved by: re MFC after: 1 month Notes: svn path=/head/; revision=166405
* Rewrite of TCP syncookies to remove locking requirements and to enhanceAndre Oppermann2006-09-131-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | functionality: - Remove a rwlock aquisition/release per generated syncookie. Locking is now integrated with the bucket row locking of syncache itself and syncookies no longer add any additional lock overhead. - Syncookie secrets are different for and stored per syncache buck row. Secrets expire after 16 seconds and are reseeded on-demand. - The computational overhead for syncookie generation and verification is one MD5 hash computation as before. - Syncache can be turned off and run with syncookies only by setting the sysctl net.inet.tcp.syncookies_only=1. This implementation extends the orginal idea and first implementation of FreeBSD by using not only the initial sequence number field to store information but also the timestamp field if present. This way we can keep track of the entire state we need to know to recreate the session in its original form. Almost all TCP speakers implement RFC1323 timestamps these days. For those that do not we still have to live with the known shortcomings of the ISN only SYN cookies. The use of the timestamp field causes the timestamps to be randomized if syncookies are enabled. The idea of SYN cookies is to encode and include all necessary information about the connection setup state within the SYN-ACK we send back and thus to get along without keeping any local state until the ACK to the SYN-ACK arrives (if ever). Everything we need to know should be available from the information we encoded in the SYN-ACK. A detailed description of the inner working of the syncookies mechanism is included in the comments in tcp_syncache.c. Reviewed by: silby (slightly earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=162277
* Back when we had T/TCP support, we used to apply differentRuslan Ermilov2006-09-071-1/+1
| | | | | | | | | | | | | | timeouts for TCP and T/TCP connections in the TIME_WAIT state, and we had two separate timed wait queues for them. Now that is has gone, the timeout is always 2*MSL again, and there is no reason to keep two queues (the first was unused anyway!). Also, reimplement the remaining queue using a TAILQ (it was technically impossible before, with two queues). Notes: svn path=/head/; revision=162111
* First step of TSO (TCP segmentation offload) support in our network stack.Andre Oppermann2006-09-061-2/+3
| | | | | | | | | | | | | | | o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6 o add CSUM_TSO flag to mbuf pkthdr csum_flags field o add tso_segsz field to mbuf pkthdr o enhance ip_output() packet length check to allow for large TSO packets o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities o adjust all callers of tcp_maxmtu[46]() accordingly Discussed on: -current, -net Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=162084
* o Backout rev. 1.125 of in_pcb.c. It appeared to behave extremelyGleb Smirnoff2006-09-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | bad under high load. For example with 40k sockets and 25k tcptw entries, connect() syscall can run for seconds. Debugging showed that it iterates the cycle millions times and purges thousands of tcptw entries at a time. Besides practical unusability this change is architecturally wrong. First, in_pcblookup_local() is used in connect() and bind() syscalls. No stale entries purging shouldn't be done here. Second, it is a layering violation. o Return back the tcptw purging cycle to tcp_timer_2msl_tw(), that was removed in rev. 1.78 by rwatson. The commit log of this revision tells nothing about the reason cycle was removed. Now we need this cycle, since major cleaner of stale tcptw structures is removed. o Disable probably necessary, but now unused tcp_twrecycleable() function. Reviewed by: ru Notes: svn path=/head/; revision=162064
* Some cleanups and janitorial work to tcp_dooptions():Andre Oppermann2006-06-261-3/+8
| | | | | | | | | | | | | | | | | | | o redefine the parameter 'is_syn' to 'flags', add TO_SYN flag and adjust its usage accordingly o update the comments to the tcp_dooptions() invocation in tcp_input():after_listen to reflect reality o move the logic checking the echoed timestamp out of tcp_dooptions() to the only place that uses it next to the invocation described in the previous item o adjust parsing of TCPOPT_SACK_PERMITTED to use the same style as the others o add comments in to struct tcpopt.to_flags #defines No functional changes. Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=159949
* Move all syncache related structures to tcp_syncache.c. They are only usedAndre Oppermann2006-06-181-39/+4
| | | | | | | | | | | there. This unbreaks userland programs that include tcp_var.h. Discussed with: rwatson Notes: svn path=/head/; revision=159725
* Rearrange fields in struct syncache and syncache_head to make them moreAndre Oppermann2006-06-171-5/+6
| | | | | | | | | cache line friendly. Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=159698
* Add locking to TCP syncache and drop the global tcpinfo lock as earlyAndre Oppermann2006-06-171-6/+9
| | | | | | | | | | | | | | | | | | | | | as possible for the syncache_add() case. The syncache timer no longer aquires the tcpinfo lock and timeout/retransmit runs can happen in parallel with bucket granularity. On a P4 the additional locks cause a slight degression of 0.7% in tcp connections per second. When IP and TCP input are deserialized and can run in parallel this little overhead can be neglected. The syncookie handling still leaves room for improvement and its random salts may be moved to the syncache bucket head structures to remove the second lock operation currently required for it. However this would be a more involved change from the way syncookies work at the moment. Reviewed by: rwatson Tested by: rwatson, ps (earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=159695
* Update TCP for infrastructural changes to the socket/pcb refcount model,Robert Watson2006-04-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, the receive code no longer requires the pcbinfo lock, and the send code only requires it if building a new connection on an otherwise unconnected socket triggered via sendto() with an address. This should significnatly reduce tcbinfo lock contention in the receive and send cases. - In order to support the invariant that so_pcb != NULL, it is now necessary for the TCP code to not discard the tcpcb any time a connection is dropped, but instead leave the tcpcb until the socket is shutdown. This case is handled by setting INP_DROPPED, to substitute for using a NULL so_pcb to indicate that the connection has been dropped. This requires the inpcb lock, but not the pcbinfo lock. - Unlike all other protocols in the tree, TCP may need to retain access to the socket after the file descriptor has been closed. Set SS_PROTOREF in tcp_detach() in order to prevent the socket from being freed, and add a flag, INP_SOCKREF, so that the TCP code knows whether or not it needs to free the socket when the connection finally does close. The typical case where this occurs is if close() is called on a TCP socket before all sent data in the send socket buffer has been transmitted or acknowledged. If INP_SOCKREF is found when the connection is dropped, we release the inpcb, tcpcb, and socket instead of flagging INP_DROPPED. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Annotate the existence of a long-standing race in the TCP timer code, in which timers are stopped but not drained when the socket is freed, as waiting for drain may lead to deadlocks, or have to occur in a context where waiting is not permitted. This race has been handled by testing to see if the tcpcb pointer in the inpcb is NULL (and vice versa), which is not normally permitted, but may be true of a inpcb and tcpcb have been freed. Add a counter to test how often this race has actually occurred, and a large comment for each instance where we compare potentially freed memory with NULL. This will have to be fixed in the near future, but requires is to further address how to handle the timer shutdown shutdown issue. - Several TCP calls no longer potentially free the passed inpcb/tcpcb, so no longer need to return a pointer to indicate whether the argument passed in is still valid. - Un-macroize debugging and locking setup for various protocol switch methods for TCP, as it lead to more obscurity, and as locking becomes more customized to the methods, offers less benefit. - Assert copyright on tcp_usrreq.c due to significant modifications that have been made as part of this work. These changes significantly modify the memory management and connection logic of our TCP implementation, and are (as such) High Risk Changes, and likely to contain serious bugs. Please report problems to the current@ mailing list ASAP, ideally with simple test cases, and optionally, packet traces. MFC after: 3 months Notes: svn path=/head/; revision=157376
* Rework TCP window scaling (RFC1323) to properly scale the send windowAndre Oppermann2006-02-281-1/+1
| | | | | | | | | | | | | | | | right from the beginning and partly clean up the differences in handling between SYN_SENT and SYN_RCVD (syncache). Further changes to this code to come. This is a first incremental step to a general overhaul and streamlining of the TCP code. PR: kern/15095 PR: kern/92690 (partly) Reviewed by: qingli (and tested with ANVL) Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=156125
* Have TCP Inflight disable itself if the RTT is below a certainAndre Oppermann2006-02-161-0/+1
| | | | | | | | | | | | | | | | | threshold. Inflight doesn't make sense on a LAN as it has trouble figuring out the maximal bandwidth because of the coarse tick granularity. The sysctl net.inet.tcp.inflight.rttthresh specifies the threshold in milliseconds below which inflight will disengage. It defaults to 10ms. Tested by: Joao Barros <joao.barros-at-gmail.com>, Rich Murphey <rich-at-whiteoaklabs.com> Sponsored by: TCP/IP Optimization Fundraise 2005 Notes: svn path=/head/; revision=155767
* - Postpone SACK option processing until after PAWS checks. SACK optionPaul Saab2005-06-271-3/+3
| | | | | | | | | | | | | | processing is now done in the ACK processing case. - Merge tcp_sack_option() and tcp_del_sackholes() into a new function called tcp_sack_doack(). - Test (SEG.ACK < SND.MAX) before processing the ACK. Submitted by: Noritoshi Demizu Reveiewed by: Mohan Srinivasan, Raja Mukerji Approved by: re Notes: svn path=/head/; revision=147637
* Changes to tcp_sack_option() thatPaul Saab2005-06-041-1/+2
| | | | | | | | | | | | | - Walks the scoreboard backwards from the tail to reduce the number of comparisons for each sack option received. - Introduce functions to add/remove sack scoreboard elements, making the code more readable. Submitted by: Noritoshi Demizu Reviewed by: Raja Mukerji, Mohan Srinivasan Notes: svn path=/head/; revision=146953
* This is conform with the terminology inPaul Saab2005-05-251-1/+1
| | | | | | | | | | | M.Mathis and J.Mahdavi, "Forward Acknowledgement: Refining TCP Congestion Control" SIGCOMM'96, August 1996. Submitted by: Noritoshi Demizu, Raja Mukerji Notes: svn path=/head/; revision=146630
* Replace t_force with a t_flag (TF_FORCEDATA).Paul Saab2005-05-211-1/+1
| | | | | | | | Submitted by: Raja Mukerji. Reviewed by: Mohan, Silby, Andre Opperman. Notes: svn path=/head/; revision=146463
* When looking for the next hole to retransmit from the scoreboard,Paul Saab2005-05-111-1/+6
| | | | | | | | | | | | | | | | | | | | | or to compute the total retransmitted bytes in this sack recovery episode, the scoreboard is traversed. While in sack recovery, this traversal occurs on every call to tcp_output(), every dupack and every partial ack. The scoreboard could potentially get quite large, making this traversal expensive. This change optimizes this by storing hints (for the next hole to retransmit and the total retransmitted bytes in this sack recovery episode) reducing the complexity to find these values from O(n) to constant time. The debug code that sanity checks the hints against the computed value will be removed eventually. Submitted by: Mohan Srinivasan, Noritoshi Demizu, Raja Mukerji. Notes: svn path=/head/; revision=146123
* - Make the sack scoreboard logic use the TAILQ macros. This improvesPaul Saab2005-04-211-5/+2
| | | | | | | | | | | | code readability and facilitates some anticipated optimizations in tcp_sack_option(). - Remove tcp_print_holes() and TCP_SACK_DEBUG. Submitted by: Raja Mukerji. Reviewed by: Mohan Srinivasan, Noritoshi Demizu. Notes: svn path=/head/; revision=145370
* Ignore ICMP Source Quench messages for TCP sessions. Source Quench isAndre Oppermann2005-04-211-2/+0
| | | | | | | | | | | | | | ineffective, depreciated and can be abused to degrade the performance of active TCP sessions if spoofed. Replace a bogus call to tcp_quench() in tcp_output() with the direct equivalent tcpcb variable assignment. Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.1 MFC after: 3 days Notes: svn path=/head/; revision=145355
* Fix for a TCP SACK bug where more than (win/2) bytes could have beenPaul Saab2005-04-141-0/+1
| | | | | | | | | | | | in flight in SACK recovery. Found by: Noritoshi Demizu Submitted by: Mohan Srinivasan <mohans at yahoo-inc dot com> Noritoshi Demizu <demizu at dd dot ij4u dot or dot jp> Raja Mukerji <raja at moselle dot com> Notes: svn path=/head/; revision=145087
* Add limits on the number of elements in the sack scoreboard bothPaul Saab2005-03-091-0/+1
| | | | | | | | | | | per-connection and globally. This eliminates potential DoS attacks where SACK scoreboard elements tie up too much memory. Submitted by: Raja Mukerji (raja at moselle dot com). Reviewed by: Mohan Srinivasan (mohans at yahoo-inc dot com). Notes: svn path=/head/; revision=143339
* Remove 2 (SACK) fields from the tcpcb. These are only used by aPaul Saab2005-02-171-3/+1
| | | | | | | | | | function that is called from tcp_input(), so they oughta be passed on the stack instead of stuck in the tcpcb. Submitted by: Mohan Srinivasan Notes: svn path=/head/; revision=142031
* o Add handling of an IPv4-mapped IPv6 address.Maxim Konovalov2005-02-141-5/+0
| | | | | | | | | | | | | | | | o Use SYSCTL_IN() macro instead of direct call of copyin(9). Submitted by: ume o Move sysctl_drop() implementation to sys/netinet/tcp_subr.c where most of tcp sysctls live. o There are net.inet[6].tcp[6].getcred sysctls already, no needs in a separate struct tcp_ident_mapping. Suggested by: ume Notes: svn path=/head/; revision=141886
* o Implement net.inet.tcp.drop sysctl and userland part, tcpdrop(8)Maxim Konovalov2005-02-061-1/+7
| | | | | | | | | | | | | | | utility: The tcpdrop command drops the TCP connection specified by the local address laddr, port lport and the foreign address faddr, port fport. Obtained from: OpenBSD Reviewed by: rwatson (locking), ru (man page), -current MFC after: 1 month Notes: svn path=/head/; revision=141381
* /* -> /*- for license, minor formatting changesWarner Losh2005-01-071-1/+1
| | | | Notes: svn path=/head/; revision=139823
* Remove the now unused tcp_canceltimers() function. tcpcb timers areRobert Watson2004-12-231-1/+0
| | | | | | | | | now stopped as part of tcp_discardcb(). MFC after: 2 weeks Notes: svn path=/head/; revision=139220
* Remove RFC1644 T/TCP support from the TCP side of the network stack.Andre Oppermann2004-11-021-40/+0
| | | | | | | | | | | | | | | | | | | A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch Notes: svn path=/head/; revision=137139
* Shave 40 unused bytes from struct tcpcb.Andre Oppermann2004-10-221-1/+0
| | | | Notes: svn path=/head/; revision=136792
* - Estimate the amount of data in flight in sack recovery and use itPaul Saab2004-10-051-1/+4
| | | | | | | | | | | | | to control the packets injected while in sack recovery (for both retransmissions and new data). - Cleanups to the sack codepaths in tcp_output.c and tcp_sack.c. - Add a new sysctl (net.inet.tcp.sack.initburst) that controls the number of sack retransmissions done upon initiation of sack recovery. Submitted by: Mohan Srinivasan <mohans@yahoo-inc.com> Notes: svn path=/head/; revision=136151
* White space cleanup for netinet before branch:Robert Watson2004-08-161-13/+13
| | | | | | | | | | | | | | - Trailing tab/space cleanup - Remove spurious spaces between or before tabs This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET. Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net> Notes: svn path=/head/; revision=133874
* The tcp syncache code was leaving the IPv6 flowlabel uninitialisedDavid Malone2004-07-171-0/+1
| | | | | | | | | | | | | | | for the SYN|ACK packet and then letting in6_pcbconnect set the flowlabel later. Arange for the syncache/syncookie code to set and recall the flow label so that the flowlabel used for the SYN|ACK is consistent. This is done by using some of the cookie (when tcp cookies are enabeled) and by stashing the flowlabel in syncache. Tested and Discovered by: Orla McGann <orly@cnri.dit.ie> Approved by: ume, silby MFC after: 1 month Notes: svn path=/head/; revision=132307
* Whitespace.Bruce M Simpson2004-06-251-3/+3
| | | | Notes: svn path=/head/; revision=131078
* Add support for TCP Selective Acknowledgements. The work for thisPaul Saab2004-06-231-1/+48
| | | | | | | | | | | | | | | | | | originated on RELENG_4 and was ported to -CURRENT. The scoreboarding code was obtained from OpenBSD, and many of the remaining changes were inspired by OpenBSD, but not taken directly from there. You can enable/disable sack using net.inet.tcp.do_sack. You can also limit the number of sack holes that all senders can have in the scoreboard with net.inet.tcp.sackhole_limit. Reviewed by: gnn Obtained from: Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan) Notes: svn path=/head/; revision=130989
* Tighten up reset handling in order to make reset attacks as difficult asMike Silbersack2004-04-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | possible while maintaining compatibility with the widest range of TCP stacks. The algorithm is as follows: --- For connections in the ESTABLISHED state, only resets with sequence numbers exactly matching last_ack_sent will cause a reset, all other segments will be silently dropped. For connections in all other states, a reset anywhere in the window will cause the connection to be reset. All other segments will be silently dropped. --- The necessity of accepting all in-window resets was discovered by jayanth and jlemon, both of whom have seen TCP stacks that will respond to FIN-ACK packets with resets not meeting the strict last_ack_sent check. Idea by: Darren Reed Reviewed by: truckman, jlemon, others(?) Notes: svn path=/head/; revision=128653
* Fix a typo in a comment.Bruce M Simpson2004-04-201-1/+1
| | | | Notes: svn path=/head/; revision=128493