summaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_syncache.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Syncoockies can be used in combination with the syncache. If the cacheMichael Tuexen2017-04-201-5/+21
| | | | | | | | | | | | | | | | | | overflows, syncookies are used. This patch restricts the usage of syncookies in this case: accept syncookies only if there was an overflow of the syncache recently. This mitigates a problem reported in PR217637, where is syncookie was accepted without any recent drops. Thanks to glebius@ for suggesting an improvement. PR: 217637 Reviewed by: gnn, glebius MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10272 Notes: svn path=/head/; revision=317208
* Hide struct inpcb, struct tcpcb from the userland.Gleb Smirnoff2017-03-211-7/+7
| | | | | | | | | | | | | | | | | | | | | | | This is a painful change, but it is needed. On the one hand, we avoid modifying them, and this slows down some ideas, on the other hand we still eventually modify them and tools like netstat(1) never work on next version of FreeBSD. We maintain a ton of spares in them, and we already got some ifdef hell at the end of tcpcb. Details: - Hide struct inpcb, struct tcpcb under _KERNEL || _WANT_FOO. - Make struct xinpcb, struct xtcpcb pure API structures, not including kernel structures inpcb and tcpcb inside. Export into these structures the fields from inpcb and tcpcb that are known to be used, and put there a ton of spare space. - Make kernel and userland utilities compilable after these changes. - Bump __FreeBSD_version. Reviewed by: rrs, gnn Differential Revision: D10018 Notes: svn path=/head/; revision=315662
* Merge projects/ipsec into head/.Andrey V. Elsukov2017-02-061-54/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352 Notes: svn path=/head/; revision=313330
* Add a knob to change default behavior of inheriting listen socket's tcp stackHiren Panchasara2017-01-271-1/+9
| | | | | | | | | | | | | | | | regardless of what the default stack for the system is set to. With current/default behavior, after changing the default tcp stack, the application needs to be restarted to pick up that change. Setting this new knob net.inet.tcp.functions_inherit_listen_socket_stack to '0' would change that behavior and make any new connection use the newly selected default tcp stack. Reviewed by: rrs MFC after: 2 weeks Sponsored by: Limelight Networks Notes: svn path=/head/; revision=312907
* Remove assigned only variable.Gleb Smirnoff2016-12-211-2/+1
| | | | Notes: svn path=/head/; revision=310376
* For RTT calculations mid-session, we explicitly ignore ACKs with tsecr of 0 asHiren Panchasara2016-11-211-3/+10
| | | | | | | | | | | | | | many borken middle-boxes tend to do that. But during 3whs, in syncache_expand(), we don't do that which causes us to send a RST to such a client. Relax this constraint by only using tsecr to compare against timestamp that we sent when it is not 0. As a result, we'd now accept the final ACK of 3whs with tsecr of 0. Reviewed by: jtl, gnn Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D8552 Notes: svn path=/head/; revision=308943
* Remove an extraneous call to soisconnected() in syncache_socket(),Julien Charbon2016-10-261-4/+0
| | | | | | | | | | | | | | | | | | introduced with r261242. The useful and expected soisconnected() call is done in tcp_do_segment(). Has been found as part of unrelated PR:212920 investigation. Improve slightly (~2%) the maximum number of TCP accept per second. Tested by: kevin.bowling_kev009.com, jch Approved by: gnn, hiren MFC after: 1 week Sponsored by: Verisign, Inc Differential Revision: https://reviews.freebsd.org/D8072 Notes: svn path=/head/; revision=307966
* Fix cases where the TFO pending counter would leak references, and ↵Patrick Kelsey2016-10-151-10/+24
| | | | | | | | | | | | | | | | eventually, memory. Also renamed some tfo labels and added/reworked comments for clarity. Based on an initial patch from jtl. PR: 213424 Reviewed by: jtl MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D8235 Notes: svn path=/head/; revision=307337
* The TFO server-side code contains some changes that are not conditioned onJonathan T. Looney2016-10-121-1/+1
| | | | | | | | | | | | | | | | the TCP_RFC7413 kernel option. This change removes those few instructions from the packet processing path. While not strictly necessary, for the sake of consistency, I applied the new IS_FASTOPEN macro to all places in the packet processing path that used the (t_flags & TF_FASTOPEN) check. Reviewed by: hiren Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8219 Notes: svn path=/head/; revision=307153
* Fix an issue with accept_filter introduced with r261242:Julien Charbon2016-09-291-1/+3
| | | | | | | | | | | | | | | | | | | As a side effect of r261242 when using accept_filter the first call to soisconnected() is done earlier in tcp_input() instead of tcp_do_segment() context. Restore the expected behaviour. Note: This call to soisconnected() seems to be extraneous in all cases (with or without accept_filter). Will be addressed in a separate commit. PR: 212920 Reported by: Alexey Tested by: Alexey, jch Sponsored by: Verisign, Inc. MFC after: 1 week Notes: svn path=/head/; revision=306443
* Here we update the modular tcp to be able to switch to anRandall Stewart2016-08-161-1/+1
| | | | | | | | | | | | | | | | alternate TCP stack in other then the closed state (pre-listen/connect). The idea is that *if* that is supported by the alternate stack, it is asked if its ok to switch. If it approves the "handoff" then we allow the switch to happen. Also the fini() function now gets a flag to tell if you are switching away *or* the tcb is destroyed. The init() call into the alternate stack is moved to the end so the tcb is more fully formed before the init transpires. Sponsored by: Netflix Inc. Differential Revision: D6790 Notes: svn path=/head/; revision=304223
* tcp/syncache: Add comment for syncache_respondSepherosa Ziehau2016-05-101-0/+9
| | | | | | | | | | Suggested by: hiren, hps Reviewed by: sbruno Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6148 Notes: svn path=/head/; revision=299315
* sys/net*: minor spelling fixes.Pedro F. Giffuni2016-05-031-2/+2
| | | | | | | No functional change. Notes: svn path=/head/; revision=298995
* tcp/syncache: Set flowid and hash type properly for SYN|ACKSepherosa Ziehau2016-04-291-5/+11
| | | | | | | | | | | So the underlying drivers can use it to select the sending queue properly for SYN|ACK instead of rolling their own hash. Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6120 Notes: svn path=/head/; revision=298769
* Indentation issues.Pedro F. Giffuni2016-04-201-3/+2
| | | | | | | | | Contract some lines leftover from r298310. Mea culpa. Notes: svn path=/head/; revision=298354
* kernel: use our nitems() macro when it is available through param.h.Pedro F. Giffuni2016-04-191-2/+2
| | | | | | | | | No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current Notes: svn path=/head/; revision=298310
* Mfp: r296309Bjoern A. Zeeb2016-04-091-2/+6
| | | | | | | | | | | | | | While there is no dependency interaction, stopping the timer before freeing the rest of the resources seems more natural and avoids it being scheduled an extra time when it is no longer needed. Reviewed by: gnn, emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5733 Notes: svn path=/head/; revision=297736
* Redo r294869. The array of counters for TCP states doesn't belong toGleb Smirnoff2016-03-151-3/+3
| | | | | | | | | | | struct tcpstat, because the structure can be zeroed out by netstat(1) -z, and of course running connection counts shouldn't be touched. Place running connection counts into separate array, and provide separate read-only sysctl oid for it. Notes: svn path=/head/; revision=296881
* Grab a snap amount of TCP connections in syncache from tcpstat.Gleb Smirnoff2016-01-271-19/+0
| | | | Notes: svn path=/head/; revision=294870
* Augment struct tcpstat with tcps_states[], which is used for book-keepingGleb Smirnoff2016-01-271-1/+12
| | | | | | | | | | the amount of TCP connections by state. Provides a cheap way to get connection count without traversing the whole pcb list. Sponsored by: Netflix Notes: svn path=/head/; revision=294869
* Implementation of server-side TCP Fast Open (TFO) [RFC7413].Patrick Kelsey2015-12-241-8/+123
| | | | | | | | | | | | | TFO is disabled by default in the kernel build. See the top comment in sys/netinet/tcp_fastopen.c for implementation particulars. Reviewed by: gnn, jch, stas MFC after: 3 days Sponsored by: Verisign, Inc. Differential Revision: https://reviews.freebsd.org/D4350 Notes: svn path=/head/; revision=292706
* First cut of the modularization of our TCP stack. StillRandall Stewart2015-12-161-0/+22
| | | | | | | | | | | | to do is to clean up the timer handling using the async-drain. Other optimizations may be coming to go with this. Whats here will allow differnet tcp implementations (one included). Reviewed by: jtl, hiren, transports Sponsored by: Netflix Inc. Differential Revision: D4055 Notes: svn path=/head/; revision=292309
* Use Jenkins hash for TCP syncache.Gleb Smirnoff2015-09-051-52/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | o Unlike xor, in Jenkins hash every bit of input affects virtually every bit of output, thus salting the hash actually works. With xor salting only provides a false sense of security, since if hash(x) collides with hash(y), then of course, hash(x) ^ salt would also collide with hash(y) ^ salt. [1] o Jenkins provides much better distribution than xor, very close to ideal. TCP connection setup/teardown benchmark has shown a 10% increase with default hash size, and with bigger hashes that still provide possibility for collisions. With enormous hash size, when dataset is by an order of magnitude smaller than hash size, the benchmark has shown 4% decrease in performance decrease, which is expected and acceptable. Noticed by: Jeffrey Knockel <jeffk cs.unm.edu> [1] Benchmarks by: jch Reviewed by: jch, pkelsey, delphij Security: strengthens protection against hash collision DoS Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=287481
* Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:Julien Charbon2015-08-031-4/+16
| | | | | | | | | | | | | | | | | | | | - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc. Notes: svn path=/head/; revision=286227
* Make syncookie_mac() use 'tcp_seq irs' in computing hash.Hiren Panchasara2015-01-301-0/+1
| | | | | | | | | | | | | | | This fixes what seems like a simple oversight when the function was added in r253210. Reported by: Daniel Borkmann <dborkman@redhat.com> Florian Westphal <fw@strlen.de> Differential Revision: https://reviews.freebsd.org/D1628 Reviewed by: gnn MFC after: 1 month Sponsored by: Limelight Networks Notes: svn path=/head/; revision=277938
* Start process of removing the use of the deprecated "M_FLOWID" flagHans Petter Selasky2014-12-011-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=275358
* Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.Gleb Smirnoff2014-11-071-8/+8
| | | | | | | Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=274225
* Fix typo.Andrey V. Elsukov2014-10-311-1/+1
| | | | Notes: svn path=/head/; revision=273903
* * Split tcp_signature_compute() into 2 pieces:Alexander V. Chernikov2014-09-271-12/+34
| | | | | | | | | | | | | | | | | - tcp_get_sav() - SADB key lookup - tcp_signature_do_compute() - actual computation * Fix TCP signature case for listening socket: do not assume EVERY connection coming to socket with TCP_SIGNATURE set to be md5 signed regardless of SADB key existance for particular address. This fixes the case for routing software having _some_ BGP sessions secured by md5. * Simplify TCP_SIGNATURE handling in tcp_input() MFC after: 2 weeks Notes: svn path=/head/; revision=272201
* In tcp_input(), don't acquire the pcbinfo global write lock for SYNJohn Baldwin2014-09-041-3/+0
| | | | | | | | | | | | packets targeting a listening socket. Permit to reduce TCP input processing starvation in context of high SYN load (e.g. short-lived TCP connections or SYN flood). Submitted by: Julien Charbon <jcharbon@verisign.com> Reviewed by: adrian, hiren, jhb, Mike Bentkofsky Notes: svn path=/head/; revision=271119
* syncache_lookup() is a file local function. Make it static andBjoern A. Zeeb2014-05-241-2/+1
| | | | | | | | | take it out of the public KPI; seems it was never used elsewhere. MFC after: 2 weeks Notes: svn path=/head/; revision=266619
* Ensure that the flowid hashtype is assigned to the inp if the flowidAdrian Chadd2014-05-181-0/+1
| | | | | | | is also assigned. Notes: svn path=/head/; revision=266420
* Utilize SYSCTL_UMA_CUR() to export usage of syncache andGleb Smirnoff2014-02-071-13/+2
| | | | | | | | | tcp reassembly zones. Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=261594
* Decrease lock contention within the TCP accept case by removingGeorge V. Neville-Neil2014-01-281-1/+3
| | | | | | | | | | | | | | the INP_INFO lock from tcp_usr_accept. As the PR/patch states this was following the advice already in the code. See the PR below for a full disucssion of this change and its measured effects. PR: 183659 Submitted by: Julian Charbon Reviewed by: jhb Notes: svn path=/head/; revision=261242
* If the flowid is available for the mbuf that finalised the creationAdrian Chadd2014-01-181-0/+10
| | | | | | | | | | | | | | | of a syncache connection, copy it into the inp_flowid field. Without this, an incoming TCP connection won't have an inp_flowid marked until some data comes in, and this means that things like the per-CPU TCP timer option will choose a different CPU for the timer work. (It also means that if one grabbed the flowid via an ioctl from userland, it won't be available until some data has been received.) Sponsored by: Netflix, Inc. Notes: svn path=/head/; revision=260871
* The r48589 promised to remove implicit inclusion of if_var.h soon. PrepareGleb Smirnoff2013-10-261-0/+1
| | | | | | | | | | | to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=257176
* Implement the ip, tcp, and udp DTrace providers. The probe definitions useMark Johnston2013-08-251-1/+1
| | | | | | | | | | | | dynamic translation so that their arguments match the definitions for these providers in Solaris and illumos. Thus, existing scripts for these providers should work unmodified on FreeBSD. Tested by: gnn, hiren MFC after: 1 month Notes: svn path=/head/; revision=254889
* Free the non-fatal "timestamp missing" debug string manually as it isAndre Oppermann2013-07-161-1/+4
| | | | | | | | | not covered by the catch-all free for the error cases. Found by: Coverity Notes: svn path=/head/; revision=253395
* Improve SYN cookies by encoding the MSS, WSCALE (window scaling) and SACKAndre Oppermann2013-07-111-211/+359
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | information into the ISN (initial sequence number) without the additional use of timestamp bits and switching to the very fast and cryptographically strong SipHash-2-4 MAC hash algorithm to protect the SYN cookie against forgeries. The purpose of SYN cookies is to encode all necessary session state in the 32 bits of our initial sequence number to avoid storing any information locally in memory. This is especially important when under heavy spoofed SYN attacks where we would either run out of memory or the syncache would fill with bogus connection attempts swamping out legitimate connections. The original SYN cookies method only stored an indexed MSS values in the cookie. This isn't sufficient anymore and breaks down in the presence of WSCALE information which is only exchanged during SYN and SYN-ACK. If we can't keep track of it then we may severely underestimate the available send or receive window. This is compounded with large windows whose size information on the TCP segment header is even lower numerically. A number of years back SYN cookies were extended to store the additional state in the TCP timestamp fields, if available on a connection. While timestamps are common among the BSD, Linux and other *nix systems Windows never enabled them by default and thus are not present for the vast majority of clients seen on the Internet. The common parameters used on TCP sessions have changed quite a bit since SYN cookies very invented some 17 years ago. Today we have a lot more bandwidth available making the use window scaling almost mandatory. Also SACK has become standard making recovering from packet loss much more efficient. This change moves all necessary information into the ISS removing the need for timestamps. Both the MSS (16 bits) and send WSCALE (4 bits) are stored in 3 bit indexed form together with a single bit for SACK. While this is significantly less than the original range, it is sufficient to encode all common values with minimal rounding. The MSS depends on the MTU of the path and with the dominance of ethernet the main value seen is around 1460 bytes. Encapsulations for DSL lines and some other overheads reduce it by a few more bytes for many connections seen. Rounding down to the next lower value in some cases isn't a problem as we send only slightly more packets for the same amount of data. The send WSCALE index is bit more tricky as rounding down under-estimates the available send space available towards the remote host, however a small number values dominate and are carefully selected again. The receive WSCALE isn't encoded at all but recalculated based on the local receive socket buffer size when a valid SYN cookie returns. A listen socket buffer size is unlikely to change while active. The index values for MSS and WSCALE are selected for minimal rounding errors based on large traffic surveys. These values have to be periodically validated against newer traffic surveys adjusting the arrays tcp_sc_msstab[] and tcp_sc_wstab[] if necessary. In addition the hash MAC to protect the SYN cookies is changed from MD5 to SipHash-2-4, a much faster and cryptographically secure algorithm. Reviewed by: dwmalone Tested by: Fabian Keil <fk@fabiankeil.de> Notes: svn path=/head/; revision=253210
* Extend debug logging of TCP timestamp related specificationAndre Oppermann2013-07-101-0/+17
| | | | | | | | | violations. Update related comments and style. Notes: svn path=/head/; revision=253150
* uma_zone_set_max() directly returns the rounded effective zoneAndre Oppermann2013-02-011-2/+2
| | | | | | | | | | limit. Use the return value directly instead of doing a second uma_zone_set_max() step. MFC after: 1 week Notes: svn path=/head/; revision=246208
* Add TCP_OFFLOAD hook in syncache_respond for IPv6 too, just like the oneNavdeep Parhar2013-01-251-0/+9
| | | | | | | | | that exists for IPv4. Reviewed by: bz@ Notes: svn path=/head/; revision=245919
* Fix sysctl_handle_int() usage. Either arg1 or arg2 should be supplied,Gleb Smirnoff2012-12-251-1/+1
| | | | | | | and arg2 doesn't pass size of arg1. Notes: svn path=/head/; revision=244680
* Mechanically substitute flags from historic mbuf allocator withGleb Smirnoff2012-12-051-1/+1
| | | | | | | | | | | | malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually Notes: svn path=/head/; revision=243882
* For retransmits of SYN|ACK from the syncache use the slightly moreAndre Oppermann2012-10-281-1/+1
| | | | | | | | | | aggressive special tcp_syn_backoff[] retransmit schedule instead of the normal tcp_backoff[] schedule for established connections. MFC after: 2 weeks Notes: svn path=/head/; revision=242261
* Change the syncache count reporting the current number of entriesAndre Oppermann2012-10-281-8/+15
| | | | | | | | | | | | | from an unprotected u_int that reports garbage on SMP to a function based sysctl obtaining the current value from UMA. Also read back the actual cache_limit after page size rounding by UMA. PR: kern/165879 MFC after: 2 weeks Notes: svn path=/head/; revision=242254
* When SYN or SYN/ACK had to be retransmitted RFC5681 requires us toAndre Oppermann2012-10-281-2/+3
| | | | | | | | | | | | | reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering. Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state. MFC after: 2 weeks Notes: svn path=/head/; revision=242250
* Switch the entire IPv4 stack to keep the IP packet headerGleb Smirnoff2012-10-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me> Notes: svn path=/head/; revision=241913
* - Updated TOE support in the kernel.Navdeep Parhar2012-06-191-73/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs. These are available as t3_tom and t4_tom modules that augment cxgb(4) and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as usual with or without these extra features. - iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the works and will follow soon. Build-tested with make universe. 30s overview ============ What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the capabilities of an interface: # ifconfig -m | grep TOE Enable/disable TCP offload on an interface (just like any other ifnet capability): # ifconfig cxgbe0 toe # ifconfig cxgbe0 -toe Which connections are offloaded? Look for toe4 and/or toe6 in the output of netstat and sockstat: # netstat -np tcp | grep toe # sockstat -46c | grep toe Reviewed by: bz, gnn Sponsored by: Chelsio communications. MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible) Notes: svn path=/head/; revision=237263
* It turns out that too many drivers are not only parsing the L2/3/4Bjoern A. Zeeb2012-05-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | headers for TSO but also for generic checksum offloading. Ideally we would only have one common function shared amongst all drivers, and perhaps when updating them for IPv6 we should introduce that. Eventually we should provide the meta information along with mbufs to avoid (re-)parsing entirely. To not break IPv6 (checksums and offload) and to be able to MFC the changes without risking to hurt 3rd party drivers, duplicate the v4 framework, as other OSes have done as well. Introduce interface capability flags for TX/RX checksum offload with IPv6, to allow independent toggling (where possible). Add CSUM_*_IPV6 flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6 fragmentation. Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and add an alias for CSUM_DATA_VALID_IPV6. This pretty much brings IPv6 handling in line with IPv4. TSO is still handled in a different way and not via if_hwassist. Update ifconfig to allow (un)setting of the new capability flags. Update loopback to announce the new capabilities and if_hwassist flags. Individual driver updates will have to follow, as will SCTP. Reported by: gallatin, dim, .. Reviewed by: gallatin (glanced at?) MFC after: 3 days X-MFC with: r235961,235959,235958 Notes: svn path=/head/; revision=236170