summaryrefslogtreecommitdiff
path: root/sys/netinet/raw_ip.c
Commit message (Collapse)AuthorAgeFilesLines
* Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)Pawel Biernacki2020-02-261-2/+3
| | | | | | | | | | | | | | | | | | | r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
* White space cleanup -- remove trailing tab's or spacesRandall Stewart2020-02-121-2/+2
| | | | | | | | | from any line. Sponsored by: Netflix Inc. Notes: svn path=/head/; revision=357818
* Make in_pcbladdr() require network epoch entered by its callers. TogetherGleb Smirnoff2020-01-221-2/+4
| | | | | | | | | | | with this widen network epoch coverage up to tcp_connect() and udp_connect(). Revisions from r356974 and up to this revision cover D23187. Differential Revision: https://reviews.freebsd.org/D23187 Notes: svn path=/head/; revision=356983
* Make ip6_output() and ip_output() require network epoch.Gleb Smirnoff2020-01-221-0/+3
| | | | | | | | All callers that before may called into these functions without network epoch now must enter it. Notes: svn path=/head/; revision=356974
* Now that there is no R/W lock on PCB list the pcblist sysctlsGleb Smirnoff2019-11-071-51/+21
| | | | | | | | | | handlers can be greatly simplified. All the previous double cycling and complex locking was added to avoid these functions holding global PCB locks for extended period of time, preventing addition of new entries. Notes: svn path=/head/; revision=354484
* Remove unnecessary recursive epoch enter via INP_INFO_RLOCKGleb Smirnoff2019-11-071-3/+2
| | | | | | | | macro in raw input functions for IPv4 and IPv6. They shall always run in the network epoch. Notes: svn path=/head/; revision=354474
* When sending IPv4 packets on a SOCK_RAW socket using the IP_HDRINCL option,Michael Tuexen2019-04-131-11/+18
| | | | | | | | | | | | | ensure that the ip_hl field is valid. Furthermore, ensure that the complete IPv4 header is contained in the first mbuf. Finally, move the length checks before relying on them when accessing fields of the IPv4 header. Reported by: jtl@ Reviewed by: jtl@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D19181 Notes: svn path=/head/; revision=346182
* Remove 'dir' argument from dummynet_io(). This makes it possible to makeGleb Smirnoff2019-03-141-1/+1
| | | | | | | dn_dir flags private to dummynet. There is still some room for improvement. Notes: svn path=/head/; revision=345165
* Remove 'dir' argument in ng_ipfw_input, since ip_fw_args now has this info.Gleb Smirnoff2019-03-141-2/+1
| | | | | | | While here make 'tee' boolean. Notes: svn path=/head/; revision=345163
* Make second argument of ip_divert(), that specifies packet direction a bool.Gleb Smirnoff2019-03-141-1/+1
| | | | | | | This allows pf(4) to avoid including ipfw(4) private files. Notes: svn path=/head/; revision=345161
* Improve input validation for raw IPv4 socket using the IP_HDRINCLMichael Tuexen2019-02-121-0/+30
| | | | | | | | | | | | | | option. This issue was found by running syzkaller on OpenBSD. Greg Steuck made me aware that the problem might also exist on FreeBSD. Reported by: Greg Steuck MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D18834 Notes: svn path=/head/; revision=344048
* Plug some networking sysctl leaks.Mark Johnston2018-11-221-0/+1
| | | | | | | | | | | | | | | | Various network protocol sysctl handlers were not zero-filling their output buffers and thus would export uninitialized stack memory to userland. Fix a number of such handlers. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: tuexen MFC after: 3 days Security: kernel memory disclosure Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18301 Notes: svn path=/head/; revision=340783
* Removed pointless NULL checkSteven Hartland2018-07-101-2/+0
| | | | | | | | | | Removed pointless NULL check after malloc with M_WAITOK which can never return NULL. Sponsored by: Multiplay Notes: svn path=/head/; revision=336165
* epoch(9): allow preemptible epochs to composeMatt Macy2018-07-041-8/+11
| | | | | | | | | | | | | | | | | | | | | | | - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066 Notes: svn path=/head/; revision=335924
* inpcb: don't gratuitously defer freesMatt Macy2018-07-021-7/+12
| | | | | | | | | | | Don't defer frees in sysctl handlers. It isn't necessary and it just confuses things. revert: r333911, r334104, and r334125 Requested by: jtl Notes: svn path=/head/; revision=335856
* raw_ip: validate inp in both loopsMatt Macy2018-06-211-25/+33
| | | | | | | | | | Continuation of r335497. Also move the lock acquisition up to validate before referencing inp_cred. Reported by: pho Notes: svn path=/head/; revision=335501
* raw_ip: validate inpMatt Macy2018-06-211-0/+4
| | | | | | | | | | Post r335356 it is possible to have an inpcb on the hash lists that is partially torn down. Validate before using. Reported by: pho Notes: svn path=/head/; revision=335497
* mechanical CK macro conversion of inpcbinfo listsMatt Macy2018-06-121-6/+6
| | | | | | | | This is a dependency for converting the inpcbinfo hash and info rlocks to epoch. Notes: svn path=/head/; revision=335016
* convert allocations to INVARIANTS M_ZEROMatt Macy2018-05-241-1/+1
| | | | Notes: svn path=/head/; revision=334125
* UDP: further performance improvements on txMatt Macy2018-05-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409 Notes: svn path=/head/; revision=334118
* epoch: allow for conditionally asserting that the epoch context fieldsMatt Macy2018-05-231-1/+1
| | | | | | | are unused by zeroing on INVARIANTS builds Notes: svn path=/head/; revision=334104
* inpcb: consolidate possible deletion in pcblist functions in to epochMatt Macy2018-05-201-12/+7
| | | | | | | deferred context. Notes: svn path=/head/; revision=333911
* ifnet: Replace if_addr_lock rwlock with epoch + mutexMatt Macy2018-05-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366 Notes: svn path=/head/; revision=333813
* Revert r331379 as the "simple" lock changes have revealed a deeper problemSean Bruno2018-03-231-2/+0
| | | | | | | | | | and need for a rethink. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Notes: svn path=/head/; revision=331454
* Simple locking fixes in ip_ctloutput, ip6_ctloutput, rip_ctloutput.Sean Bruno2018-03-221-0/+2
| | | | | | | | | Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14624 Notes: svn path=/head/; revision=331379
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Reduce in_pcbinfo_init() by two params. No users supply any flags to thisGleb Smirnoff2017-05-151-1/+1
| | | | | | | | | | | | function (they used to say UMA_ZONE_NOFREE), so flag parameter goes away. The zone_fini parameter also goes away. Previously no protocols (except divert) supplied zone_fini function, so inpcb locks were leaked with slabs. This was okay while zones were allocated with UMA_ZONE_NOFREE flag, but now this is a leak. Fix that by suppling inpcb_fini() function as fini method for all inpcb zones. Notes: svn path=/head/; revision=318321
* Hide struct inpcb, struct tcpcb from the userland.Gleb Smirnoff2017-03-211-6/+1
| | | | | | | | | | | | | | | | | | | | | | | This is a painful change, but it is needed. On the one hand, we avoid modifying them, and this slows down some ideas, on the other hand we still eventually modify them and tools like netstat(1) never work on next version of FreeBSD. We maintain a ton of spares in them, and we already got some ifdef hell at the end of tcpcb. Details: - Hide struct inpcb, struct tcpcb under _KERNEL || _WANT_FOO. - Make struct xinpcb, struct xtcpcb pure API structures, not including kernel structures inpcb and tcpcb inside. Export into these structures the fields from inpcb and tcpcb that are known to be used, and put there a ton of spare space. - Make kernel and userland utilities compilable after these changes. - Bump __FreeBSD_version. Reviewed by: rrs, gnn Differential Revision: D10018 Notes: svn path=/head/; revision=315662
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* Merge projects/ipsec into head/.Andrey V. Elsukov2017-02-061-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352 Notes: svn path=/head/; revision=313330
* Ensure that the buffer length and the length provided in the IPv4Michael Tuexen2017-01-131-1/+1
| | | | | | | | | | | | | | | | | header match when using a raw socket to send IPv4 packets and providing the header. If they don't match, let send return -1 and set errno to EINVAL. Before this patch is was only enforced that the length in the header is not larger then the buffer length. PR: 212283 Reviewed by: ae, gnn MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9161 Notes: svn path=/head/; revision=312063
* Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead.Kevin Lo2016-09-151-2/+2
| | | | | | | | Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878 Notes: svn path=/head/; revision=305824
* The pr_destroy field does not allow us to run the teardown code in aBjoern A. Zeeb2016-06-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | specific order. VNET_SYSUNINITs however are doing exactly that. Thus remove the VIMAGE conditional field from the domain(9) protosw structure and replace it with VNET_SYSUNINITs. This also allows us to change some order and to make the teardown functions file local static. Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use internally. Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g., for pfil consumers (firewalls), partially for this commit and for others to come. Reviewed by: gnn, tuexen (sctp), jhb (kernel.h) Obtained from: projects/vnet MFC after: 2 weeks X-MFC: do not remove pr_destroy Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6652 Notes: svn path=/head/; revision=301114
* Send an ICMP packet indicating destination unreachable/protocolMichael Tuexen2016-05-251-1/+4
| | | | | | | | | | unreachable if we don't handle the packet in the kernel and not in userspace. MFC after: 1 week Notes: svn path=/head/; revision=300687
* Count packets as not being delivered only if they are neitherMichael Tuexen2016-05-251-2/+6
| | | | | | | | | processed by a kernel handler nor by a raw socket. MFC after: 1 week Notes: svn path=/head/; revision=300679
* netinet: for pointers replace 0 with NULL.Pedro F. Giffuni2016-04-151-1/+1
| | | | | | | | | | | These are mostly cosmetical, no functional change. Found with devel/coccinelle. Reviewed by: ae. tuexen Notes: svn path=/head/; revision=298066
* Mfp: r296345Bjoern A. Zeeb2016-04-091-2/+1
| | | | | | | | | | | | | | | No need to keep type stability on raw sockets zone. We've also been running with a KASSERT since r222488 to make sure the ipi_count is 0 on destroy. PR: 164763 Reviewed by: gnn MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5735 Notes: svn path=/head/; revision=297735
* Remove sys/eventhandler.h from net/route.hAlexander V. Chernikov2016-01-091-0/+1
| | | | | | | Reviewed by: ae Notes: svn path=/head/; revision=293470
* Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.Andrey V. Elsukov2015-07-291-6/+8
| | | | | | | | | | | | | | Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149 Notes: svn path=/head/; revision=286001
* o Use new function ip_fillid() in all places throughout the kernel,Gleb Smirnoff2015-04-011-1/+5
| | | | | | | | | | | | | | | | | | | | where we want to create a new IP datagram. o Add support for RFC6864, which allows to set IP ID for atomic IP datagrams to any value, to improve performance. The behaviour is controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by default. o In case if we generate IP ID, use counter(9) to improve performance. o Gather all code related to IP ID into ip_id.c. Differential Revision: https://reviews.freebsd.org/D2177 Reviewed by: adrian, cy, rpaulo Tested by: Emeric POUPON <emeric.poupon stormshield.eu> Sponsored by: Netflix Sponsored by: Nginx, Inc. Relnotes: yes Notes: svn path=/head/; revision=280971
* Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.Gleb Smirnoff2014-11-071-1/+1
| | | | | | | Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=274225
* Make SOCK_RAW sockets to be truly raw, not modifying received and sentGleb Smirnoff2014-09-011-14/+2
| | | | | | | | | | | | | | | | | packets at all. Swapping byte order on SOCK_RAW was actually a bug, an artifact from the BSD network stack, that used to convert a packet to native byte order once it is received by kernel. Other operating systems didn't follow this, and later other BSD descendants fixed this, leaving us alone with the bug. Now it is clear that we should fix the bug. In collaboration with: Olivier Cochard-Labbé <olivier cochard.me> See also: https://wiki.freebsd.org/SOCK_RAW Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=270929
* Change pr_output's prototype to avoid the need for explicit casts.Kevin Lo2014-08-151-1/+8
| | | | | | | | | | This is a follow up to r269699. Phabric: D564 Reviewed by: jhb Notes: svn path=/head/; revision=270008
* Merge 'struct ip6protosw' and 'struct protosw' into one. Now we haveKevin Lo2014-08-081-4/+7
| | | | | | | | | | only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb Notes: svn path=/head/; revision=269699
* Fix jailed raw sockets not setting the correct source address bySteven Hartland2014-04-241-7/+7
| | | | | | | | | calling in_pcbladdr instead of prison_get_ip4 MFC after: 1 month Notes: svn path=/head/; revision=264879
* netinet code no longer uses IFA_RTSELF.Gleb Smirnoff2013-11-051-4/+0
| | | | Notes: svn path=/head/; revision=257693
* Cleanup in_ifscrub(), which is just an entry to in_scrubprefix().Gleb Smirnoff2013-11-011-2/+2
| | | | Notes: svn path=/head/; revision=257499
* The r48589 promised to remove implicit inclusion of if_var.h soon. PrepareGleb Smirnoff2013-10-261-0/+1
| | | | | | | | | | | to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=257176
* Mechanically substitute flags from historic mbuf allocator withGleb Smirnoff2012-12-051-1/+1
| | | | | | | | | | | | malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually Notes: svn path=/head/; revision=243882
* Do not reduce ip_len by size of IP header in the ip_input()Gleb Smirnoff2012-10-231-6/+4
| | | | | | | | | | | | | | before passing a packet to protocol input routines. For several protocols this mean that now protocol needs to do subtraction itself, and for another half this means that we do not need to add header length back to the packet. Make ip_stripoptions() to adjust ip_len, since now we enter this function with a packet header whose ip_len does represent length of entire packet, not payload only. Notes: svn path=/head/; revision=241923