summaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_input.c
Commit message (Collapse)AuthorAgeFilesLines
* pass pcb rather than so. it is expected that per socket policyHajimu UMEMOTO2004-02-031-2/+2
| | | | | | | works again. Notes: svn path=/head/; revision=125396
* Merge from DragonFlyBSD rev 1.10:Jeffrey Hsu2004-01-201-6/+5
| | | | | | | | | | date: 2003/09/02 10:04:47; author: hsu; state: Exp; lines: +5 -6 Account for when Limited Transmit is not congestion window limited. Obtained from: DragonFlyBSD Notes: svn path=/head/; revision=124761
* Limiters and sanity checks for TCP MSS (maximum segement size)Andre Oppermann2004-01-081-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | resource exhaustion attacks. For network link optimization TCP can adjust its MSS and thus packet size according to the observed path MTU. This is done dynamically based on feedback from the remote host and network components along the packet path. This information can be abused to pretend an extremely low path MTU. The resource exhaustion works in two ways: o during tcp connection setup the advertized local MSS is exchanged between the endpoints. The remote endpoint can set this arbitrarily low (except for a minimum MTU of 64 octets enforced in the BSD code). When the local host is sending data it is forced to send many small IP packets instead of a large one. For example instead of the normal TCP payload size of 1448 it forces TCP payload size of 12 (MTU 64) and thus we have a 120 times increase in workload and packets. On fast links this quickly saturates the local CPU and may also hit pps processing limites of network components along the path. This type of attack is particularly effective for servers where the attacker can download large files (WWW and FTP). We mitigate it by enforcing a minimum MTU settable by sysctl net.inet.tcp.minmss defaulting to 256 octets. o the local host is reveiving data on a TCP connection from the remote host. The local host has no control over the packet size the remote host is sending. The remote host may chose to do what is described in the first attack and send the data in packets with an TCP payload of at least one byte. For each packet the tcp_input() function will be entered, the packet is processed and a sowakeup() is signalled to the connected process. For example an attack with 2 Mbit/s gives 4716 packets per second and the same amount of sowakeup()s to the process (and context switches). This type of attack is particularly effective for servers where the attacker can upload large amounts of data. Normally this is the case with WWW server where large POSTs can be made. We mitigate this by calculating the average MSS payload per second. If it goes below 'net.inet.tcp.minmss' and the pps rate is above 'net.inet.tcp.minmssoverload' defaulting to 1000 this particular TCP connection is resetted and dropped. MITRE CVE: CAN-2004-0002 Reviewed by: sam (mentor) MFC after: 1 day Notes: svn path=/head/; revision=124258
* Enable the following TCP options by default to give it more exposure:Andre Oppermann2004-01-061-2/+2
| | | | | | | | | | | | | | | rfc3042 Limited retransmit rfc3390 Increasing TCP's initial congestion Window inflight TCP inflight bandwidth limiting All my production server have it enabled and there have been no issues. I am confident about having them on by default and it gives us better overall TCP performance. Reviewed by: sam (mentor) Notes: svn path=/head/; revision=124199
* Restructure a too broad ifdef which was disabling the setting of theAndre Oppermann2003-11-251-2/+4
| | | | | | | | | tcp flightsize sysctl value for local networks in the !INET6 case. Approved by: re (scottl) Notes: svn path=/head/; revision=122987
* Introduce tcp_hostcache and remove the tcp specific metrics fromAndre Oppermann2003-11-201-144/+200
| | | | | | | | | | | | | | | | | | | | | | | | | | the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl) Notes: svn path=/head/; revision=122922
* Introduce a MAC label reference in 'struct inpcb', which cachesRobert Watson2003-11-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories Notes: svn path=/head/; revision=122875
* dropwithreset is not needed in this case as tcp_drop() is already notifyingAndre Oppermann2003-11-121-1/+1
| | | | | | | the other side. Before we were sending two RST packets. Notes: svn path=/head/; revision=122576
* o correct locking problem: the inpcb must be held across tcp_respondSam Leffler2003-11-081-3/+3
| | | | | | | | | | o add assertions in tcp_respond to validate inpcb locking assumptions o use local variable instead of chasing pointers in tcp_respond Supported by: FreeBSD Foundation Notes: svn path=/head/; revision=122327
* speedup stream socket recv handling by tracking the tail ofSam Leffler2003-10-281-3/+3
| | | | | | | | | | the mbuf chain instead of walking the list for each append Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe) Notes: svn path=/head/; revision=121628
* enclose IPv6 part with ifdef INET6.Hajimu UMEMOTO2003-10-201-2/+3
| | | | | | | Obtained from: KAME Notes: svn path=/head/; revision=121285
* correct linkmtu handling.Hajimu UMEMOTO2003-10-201-2/+11
| | | | | | | Obtained from: KAME Notes: svn path=/head/; revision=121283
* - add dom_if{attach,detach} framework.Hajimu UMEMOTO2003-10-171-2/+1
| | | | | | | | | - transition to use ifp->if_afdata. Obtained from: KAME Notes: svn path=/head/; revision=121161
* A number of patches in the last years have created new return pathsHartmut Brandt2003-08-131-0/+21
| | | | | | | | | | | | | in tcp_input that leave the function before hitting the tcp_trace function call for the TCPDEBUG option. This has made TCPDEBUG mostly useless (and tools like ports/benchmarks/dbs not working). Add tcp_trace calls to the return paths that could be identified in this maze. This is a NOP unless you compile with TCPDEBUG. Notes: svn path=/head/; revision=118861
* Unify the "send high" and "recover" variables as specified in theJeffrey Hsu2003-07-151-19/+24
| | | | | | | | | | | | | | | lastest rev of the spec. Use an explicit flag for Fast Recovery. [1] Fix bug with exiting Fast Recovery on a retransmit timeout diagnosed by Lu Guohan. [2] Reviewed by: Thomas Henderson <thomas.r.henderson@boeing.com> Reported and tested by: Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2] Approved by: Thomas Henderson <thomas.r.henderson@boeing.com>, Sally Floyd <floyd@acm.org> [1] Notes: svn path=/head/; revision=117650
* Add /* FALLTHROUGH */Poul-Henning Kamp2003-05-311-0/+1
| | | | | | | Found by: FlexeLint Notes: svn path=/head/; revision=115503
* Correct a bug introduced with reduced TCP state handling; makeRobert Watson2003-05-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | sure that the MAC label on TCP responses during TIMEWAIT is properly set from either the socket (if available), or the mbuf that it's responding to. Unfortunately, this is made somewhat difficult by the TCP code, as tcp_twstart() calls tcp_twrespond() after discarding the socket but without a reference to the mbuf that causes the "response". Passing both the socket and the mbuf works arounds this--eventually it might be good to make sure the mbuf always gets passed in in "response" scenarios but working through this provided to complicate things too much. Approved by: re (scottl) Reviewed by: hsu Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories Notes: svn path=/head/; revision=114794
* Explicitly declare 'int' parameters.David E. O'Brien2003-04-211-0/+1
| | | | Notes: svn path=/head/; revision=113799
* Observe conservation of packets when entering Fast Recovery whileJeffrey Hsu2003-04-011-3/+21
| | | | | | | | | | | | doing Limited Transmit. Only artificially inflate the congestion window by 1 segment instead of the usual 3 to take into account the 2 already sent by Limited Transmit. Approved in principle by: Mark Allman <mallman@grc.nasa.gov>, Hari Balakrishnan <hari@nms.lcs.mit.edu>, Sally Floyd <floyd@icir.org> Notes: svn path=/head/; revision=112957
* Greatly simplify the unlocking logic by holding the TCP protocol lock untilJeffrey Hsu2003-03-131-8/+2
| | | | | | | | | after FIN_WAIT_2 processing. Helped with debugging: Doug Barton Notes: svn path=/head/; revision=112191
* Add support for RFC 3390, which allows for a variable-sizedJeffrey Hsu2003-03-131-2/+9
| | | | | | | initial congestion window. Notes: svn path=/head/; revision=112171
* Implement the Limited Transmit algorithm (RFC 3042).Jeffrey Hsu2003-03-121-0/+14
| | | | Notes: svn path=/head/; revision=112162
* Remove a panic(); if the zone allocator can't provide more timewaitJonathan Lemon2003-03-081-4/+3
| | | | | | | | | | structures, reuse the oldest one. Also move the expiry timer from a per-structure callout to the tcp slow timer. Sponsored by: DARPA, NAI Labs Notes: svn path=/head/; revision=112009
* In timewait state, if the incoming segment is a pure in-sequence ackJonathan Lemon2003-02-261-2/+4
| | | | | | | | | | | | that matches snd_max, then do not respond with an ack, just drop the segment. This fixes a problem where a simultaneous close results in an ack loop between two time-wait states. Test case supplied by: Tim Robbins <tjr@FreeBSD.ORG> Sponsored by: DARPA, NAI Labs Notes: svn path=/head/; revision=111560
* The TCP protocol lock may still be held if the reassembly queue dropped FIN.Jonathan Lemon2003-02-261-1/+2
| | | | | | | | | Detect this case and drop the lock accordingly. Sponsored by: DARPA, NAI Labs Notes: svn path=/head/; revision=111549
* tcp_twstart() need to be called with the TCP protocol lock held to avoidJeffrey Hsu2003-02-241-6/+8
| | | | | | | a race condition with the TCP timer routines. Notes: svn path=/head/; revision=111389
* Pass the right function to callout_reset() for a compressedJeffrey Hsu2003-02-241-1/+1
| | | | | | | TIME-WAIT control block. Notes: svn path=/head/; revision=111386
* Yesterday just wasn't my day. Remove testing delta that crept into the diff.Jonathan Lemon2003-02-231-1/+1
| | | | | | | Pointy hat provided by: sam Notes: svn path=/head/; revision=111319
* Check to see if the TF_DELACK flag is set before returning fromJonathan Lemon2003-02-221-8/+7
| | | | | | | | | | | tcp_input(). This unbreaks delack handling, while still preserving correct T/TCP behavior Tested by: maxim Sponsored by: DARPA, NAI Labs Notes: svn path=/head/; revision=111266
* Add a TCP TIMEWAIT state which uses less space than a fullblown TCPJonathan Lemon2003-02-191-30/+186
| | | | | | | | | | | control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs Notes: svn path=/head/; revision=111145
* Correct comments.Jonathan Lemon2003-02-191-7/+4
| | | | Notes: svn path=/head/; revision=111140
* Clean up delayed acks and T/TCP interactions:Jonathan Lemon2003-02-191-28/+27
| | | | | | | | | | | - delay acks for T/TCP regardless of delack setting - fix bug where a single pass through tcp_input might not delay acks - use callout_active() instead of callout_pending() Sponsored by: DARPA, NAI Labs Notes: svn path=/head/; revision=111139
* The protocol lock is always held in the dropafterack case, so we don'tJeffrey Hsu2003-02-131-2/+2
| | | | | | | need to check for it at runtime. Notes: svn path=/head/; revision=110830
* Add the TCP flags to the log message whenever log_in_vain is 1, notCrist J. Clark2003-02-021-8/+3
| | | | | | | | | | just when set to 2. PR: kern/43348 MFC after: 5 days Notes: svn path=/head/; revision=110251
* Fix NewReno.Jeffrey Hsu2003-01-131-41/+44
| | | | | | | Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com> Notes: svn path=/head/; revision=109175
* Remove the PAWS ack-on-ack debugging printf().Matthew Dillon2002-12-301-5/+2
| | | | | | | | | | | | | Note that the original RFC 1323 (PAWS) says in 4.2.1 that the out of order / reverse-time-indexed packet should be acknowledged as specified in RFC-793 page 69 then dropped. The original PAWS code in FreeBSD (1994) simply acknowledged the segment unconditionally, which is incorrect, and was fixed in 1.183 (2002). At the moment we do not do checks for SYN or FIN in addition to (tlen != 0), which may or may not be correct, but the worst that ought to happen should be a retry by the sender. Notes: svn path=/head/; revision=108464
* Unravel a nested conditional.Jeffrey Hsu2002-12-201-21/+12
| | | | | | | Remove an unneeded local variable. Notes: svn path=/head/; revision=108123
* Fix syntax in last commit.Matthew Dillon2002-12-171-3/+3
| | | | Notes: svn path=/head/; revision=107961
* Bruce forwarded this tidbit from an analysis Van Jacobson did on anMatthew Dillon2002-12-141-1/+6
| | | | | | | | | | | | | apparent ack-on-ack problem with FreeBSD. Prof. Jacobson noticed a case in our TCP stack which would acknowledge a received ack-only packet, which is not legal in TCP. Submitted by: Van Jacobson <van@packetdesign.com>, bmah@packetdesign.com (Bruce A. Mah) MFC after: 7 days Notes: svn path=/head/; revision=107854
* a better solution to building FAST_IPSEC w/o INET6Sam Leffler2002-11-101-4/+0
| | | | | | | Submitted by: Jeffrey Hsu <hsu@FreeBSD.org> Notes: svn path=/head/; revision=106736
* fixup FAST_IPSEC build w/o INET6Sam Leffler2002-11-081-1/+4
| | | | Notes: svn path=/head/; revision=106679
* - Consistently update snd_wl1, snd_wl2, and rcv_up in the headerJeff Roberson2002-10-311-1/+15
| | | | | | | | | | | | | | prediction code. Previously, 2GB worth of header predicted data could leave these variables too far out of sequence which would cause problems after receiving a packet that did not match the header prediction. Submitted by: Bill Baumann <bbaumann@isilon.com> Sponsored by: Isilon Systems, Inc. Reviewed by: hsu, pete@isilon.com, neal@isilon.com, aaronp@isilon.com Notes: svn path=/head/; revision=106271
* Don't need to check if SO_OOBINLINE is defined.Jeffrey Hsu2002-10-301-13/+8
| | | | | | | | Don't need to protect isipv6 conditional with INET6. Fix leading indentation in 2 lines. Notes: svn path=/head/; revision=106198
* Tie new "Fast IPsec" code into the build. This involves the usualSam Leffler2002-10-161-0/+19
| | | | | | | | | | | | | | | configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation). As noted previously, don't use FAST_IPSEC with INET6 at the moment. Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks Notes: svn path=/head/; revision=105199
* Replace aux mbufs with packet tags:Sam Leffler2002-10-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month Notes: svn path=/head/; revision=105194
* Guido found another bug. There is a situation withMatthew Dillon2002-09-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | timestamped TCP packets where FreeBSD will send DATA+FIN and A W2K box will ack just the DATA portion. If this occurs after FreeBSD has done a (NewReno) fast-retransmit and is recovering it (dupacks > threshold) it triggers a case in tcp_newreno_partial_ack() (tcp_newreno() in stable) where tcp_output() is called with the expectation that the retransmit timer will be reloaded. But tcp_output() falls through and returns without doing anything, causing the persist timer to be loaded instead. This causes the connection to hang until W2K gives up. This occurs because in the case where only the FIN must be acked, the 'len' calculation in tcp_output() will be 0, a lot of checks will be skipped, and the FIN check will also be skipped because it is designed to handle FIN retransmits, not forced transmits from tcp_newreno(). The solution is to simply set TF_ACKNOW before calling tcp_output() to absolute guarentee that it will run the send code and reset the retransmit timer. TF_ACKNOW is already used for this purpose in other cases. For some unknown reason this patch also seems to greatly reduce the number of duplicate acks received when Guido runs his tests over a lossy network. It is quite possible that there are other tcp_newreno{_partial_ack()} cases which were not generating the expected output which this patch also fixes. X-MFC after: Will be MFC'd after the freeze is over Notes: svn path=/head/; revision=104226
* Fix issue where shutdown(socket, SHUT_RD) was effectivelyMike Silbersack2002-09-221-3/+10
| | | | | | | | | | | ignored for TCP sockets. NetBSD PR: 18185 Submitted by: Sean Boudreau <seanb@qnx.com> MFC after: 3 days Notes: svn path=/head/; revision=103776
* Guido reported an interesting bug where an FTP connection between aMatthew Dillon2002-09-171-5/+23
| | | | | | | | | | | | | | | | | | | | | | Windows 2000 box and a FreeBSD box could stall. The problem turned out to be a timestamp reply bug in the W2K TCP stack. FreeBSD sends a timestamp with the SYN, W2K returns a timestamp of 0 in the SYN+ACK causing FreeBSD to calculate an insane SRTT and RTT, resulting in a maximal retransmit timeout (60 seconds). If there is any packet loss on the connection for the first six or so packets the retransmit case may be hit (the window will still be too small for fast-retransmit), causing a 60+ second pause. The W2K box gives up and closes the connection. This commit works around the W2K bug. 15:04:59.374588 FREEBSD.20 > W2K.1036: S 1420807004:1420807004(0) win 65535 <mss 1460,nop,wscale 2,nop,nop,timestamp 188297344 0> (DF) [tos 0x8] 15:04:59.377558 W2K.1036 > FREEBSD.20: S 4134611565:4134611565(0) ack 1420807005 win 17520 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0> (DF) Bug reported by: Guido van Rooij <guido@gvr.org> Notes: svn path=/head/; revision=103505
* Replace various spelling with FALLTHROUGH which is lint()ablePhilippe Charnier2002-08-251-1/+1
| | | | Notes: svn path=/head/; revision=102412
* Enclose IPv6 addresses in brackets when they are displayed printable with aJuli Mallett2002-08-191-3/+7
| | | | | | | | | | | TCP/UDP port seperated by a colon. This is for the log_in_vain facility. Pointed out by: Edward J. M. Brocklesby Reviewed by: ume MFC after: 2 weeks Notes: svn path=/head/; revision=102131