summaryrefslogtreecommitdiff
path: root/sys/netinet6
Commit message (Collapse)AuthorAgeFilesLines
* Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domainAndrew Gallatin2020-12-192-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to efficiently serve web traffic on a NUMA machine, one must avoid as many NUMA domain crossings as possible. With SO_REUSEPORT_LB, a number of workers can share a listen socket. However, even if a worker sets affinity to a core or set of cores on a NUMA domain, it will receive connections associated with all NUMA domains in the system. This will lead to cross-domain traffic when the server writes to the socket or calls sendfile(), and memory is allocated on the server's local NUMA node, but transmitted on the NUMA node associated with the TCP connection. Similarly, when the server reads from the socket, he will likely be reading memory allocated on the NUMA domain associated with the TCP connection. This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A server can now tell the kernel to filter traffic so that only incoming connections associated with the desired NUMA domain are given to the server. (Of course, in the case where there are no servers sharing the listen socket on some domain, then as a fallback, traffic will be hashed as normal to all servers sharing the listen socket regardless of domain). This allows a server to deal only with traffic that is local to its NUMA domain, and avoids cross-domain traffic in most cases. This patch, and a corresponding small patch to nginx to use TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted https media content from dual-socket Xeons with only 13% (as measured by pcm.x) cross domain traffic on the memory controller. Reviewed by: jhb, bz (earlier version), bcr (man page) Tested by: gonzo Sponsored by: Netfix Differential Revision: https://reviews.freebsd.org/D21636 Notes: svn path=/head/; revision=368819
* Expose nonstandard IPv6 kernel definitions to standalone builds.Hans Petter Selasky2020-12-041-1/+1
| | | | | | | | | | | No functional change. Reviewed by: bz@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking Notes: svn path=/head/; revision=368353
* Remove RADIX_MPATH config option.Alexander V. Chernikov2020-11-291-5/+1
| | | | | | | | | | | | ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D27244 Notes: svn path=/head/; revision=368164
* Refactor fib4/fib6 functions.Alexander V. Chernikov2020-11-292-47/+84
| | | | | | | | | | | | | | | | | No functional changes. * Make lookup path of fib<4|6>_lookup_debugnet() separate functions (fib<46>_lookup_rt()). These will be used in the control plane code requiring unlocked radix operations and actual prefix pointer. * Make lookup part of fib<4|6>_check_urpf() separate functions. This change simplifies the switch to alternative lookup implementations, which helps algorithmic lookups introduction. * While here, use static initializers for IPv4/IPv6 keys Differential Revision: https://reviews.freebsd.org/D27405 Notes: svn path=/head/; revision=368147
* IPv6: set ifdisabled in the kernel rather than in rcBjoern A. Zeeb2020-11-251-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | Enable ND6_IFF_IFDISABLED when the interface is created in the kernel before return to user space. This avoids a race when an interface is create by a program which also calls ifconfig IF inet6 -ifdisabled and races with the devd -> /etc/pccard_ether -> .. netif start IF -> ifdisabled calls (the devd/rc framework disabling IPv6 again after the program had enabled it already). In case the global net.inet6.ip6.accept_rtadv was turned on, we also default to enabling IPv6 on the interfaces, rather than disabling them. PR: 248172 Reported by: Gert Doering (gert greenie.muc.de) Reviewed by: glebius (, phk) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D27324 Notes: svn path=/head/; revision=368031
* Refactor rib iterator functions.Alexander V. Chernikov2020-11-221-1/+1
| | | | | | | | | | | | | | | * Make rib_walk() order of arguments consistent with the rest of RIB api * Add rib_walk_ext() allowing to exec callback before/after iteration. * Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del * Rename rt_forach_fib_walk -> rib_foreach_table_walk * Move rib_foreach_table_walk{_del} to route/route_helpers.c * Slightly refactor rib_foreach_table_walk{_del} to make the implementation consistent and prepare for upcoming iterator optimizations. Differential Revision: https://reviews.freebsd.org/D27219 Notes: svn path=/head/; revision=367941
* Fix implicit automatic local port selection for IPv6 during connect calls.Jonathan T. Looney2020-11-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | When a user creates a TCP socket and tries to connect to the socket without explicitly binding the socket to a local address, the connect call implicitly chooses an appropriate local port. When evaluating candidate local ports, the algorithm checks for conflicts with existing ports by doing a lookup in the connection hash table. In this circumstance, both the IPv4 and IPv6 code look for exact matches in the hash table. However, the IPv4 code goes a step further and checks whether the proposed 4-tuple will match wildcard (e.g. TCP "listen") entries. The IPv6 code has no such check. The missing wildcard check can cause problems when connecting to a local server. It is possible that the algorithm will choose the same value for the local port as the foreign port uses. This results in a connection with identical source and destination addresses and ports. Changing the IPv6 code to align with the IPv4 code's behavior fixes this problem. Reviewed by: tuexen Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27164 Notes: svn path=/head/; revision=367680
* Fix use-after-free in icmp6_notify_error().Alexander V. Chernikov2020-10-281-9/+5
| | | | | | | | | Reported by: Maxime Villard <max at m00nbsd.net> Reviewed by: markj MFC after: 3 days Notes: svn path=/head/; revision=367114
* icmp6: Count packets dropped due to an invalid hop limitMark Johnston2020-10-193-5/+10
| | | | | | | | | | | | | | | Pad the icmp6stat structure so that we can add more counters in the future without breaking compatibility again, last done in r358620. Annotate the rarely executed error paths with __predict_false while here. Reviewed by: bz, melifaro Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26578 Notes: svn path=/head/; revision=366842
* Implement flowid calculation for outbound connections to balanceAlexander V. Chernikov2020-10-188-35/+122
| | | | | | | | | | | | | | | | | | | connections over multiple paths. Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections. This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system. Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523 Notes: svn path=/head/; revision=366813
* Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow.Richard Scheffenegger2020-10-092-0/+50
| | | | | | | | | | | | | | | | | | This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt) option IP(V6)_VLAN_PCP, which can be set to -1 (interface default), or explicitly to any priority between 0 and 7. Note that for untagged traffic, explicitly adding a priority will insert a special 801.1Q vlan header with vlan ID = 0 to carry the priority setting Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26409 Notes: svn path=/head/; revision=366569
* Introduce scalable route multipath.Alexander V. Chernikov2020-10-033-40/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449 Notes: svn path=/head/; revision=366390
* Rework part of routing code to reduce difference to D26449.Alexander V. Chernikov2020-09-211-10/+15
| | | | | | | | | | | | | | * Split rt_setmetrics into get_info_weight() and rt_set_expire_info(), as these two can be applied at different entities and at different times. * Start filling route weight in route change notifications * Pass flowid to UDP/raw IP route lookups * Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact that rtentry can contain multiple nexthops. Differential Revision: https://reviews.freebsd.org/D26497 Notes: svn path=/head/; revision=365973
* Remove unused nhop_ref_any() function.Alexander V. Chernikov2020-09-202-3/+0
| | | | | | | | | Remove "opt_mpath.h" header where not needed. No functional changes. Notes: svn path=/head/; revision=365930
* if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS.Navdeep Parhar2020-09-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx and rx), TSO, and RSS if the NIC is capable of performing these operations on inner VXLAN traffic. A VXLAN interface inherits the capabilities of its vxlandev interface if one is specified or of the interface that hosts the vxlanlocal address. If other interfaces will carry traffic for that VXLAN then they must have the same hardware capabilities. On transmit, if_vxlan verifies that the outbound interface has the required capabilities and then translates the CSUM_ flags to their inner equivalents. This tells the hardware ifnet that it needs to operate on the inner frame and not the outer VXLAN headers. An event is generated when a VXLAN ifnet starts. This allows hardware drivers to configure their devices to expect VXLAN traffic on the specified incoming port. On receive, the hardware does RSS and checksum verification on the inner frame. if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is not very clear why it didn't do this already. Future work: Rx: it should be possible to avoid the first trip up the protocol stack to get the frame to if_vxlan just so it can decapsulate and requeue for a second trip up the stack. The hardware NIC driver could directly call an if_vxlan receive routine for VXLAN traffic instead. Rx: LRO. depends on what happens with the previous item. There will have to to be a mechanism to indicate that it's time for if_vxlan to flush its LRO state. Reviewed by: kib@ Relnotes: Yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873 Notes: svn path=/head/; revision=365870
* Add a knob to allow zero UDP checksums for UDP/IPv6 traffic on the given UDP ↵Navdeep Parhar2020-09-181-1/+14
| | | | | | | | | | | | | | | | | port. This will be used by some upcoming changes to if_vxlan(4). RFC 7348 (VXLAN) says that the UDP checksum "SHOULD be transmitted as zero. When a packet is received with a UDP checksum of zero, it MUST be accepted for decapsulation." But the original IPv6 RFCs did not allow zero UDP checksum. RFC 6935 attempts to resolve this. Reviewed by: kib@ Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873 Notes: svn path=/head/; revision=365869
* net: clean up empty lines in .c and .h filesMateusz Guzik2020-09-0120-73/+9
| | | | Notes: svn path=/head/; revision=365071
* ipv6: quit dropping packets looping back on p2p interfacesKyle Evans2020-08-311-17/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | To paraphrase the below-referenced PR: This logic originated in the KAME project, and was even controversial when it was enabled there by default in 2001. No such equivalent logic exists in the IPv4 stack, and it turns out that this leads to us dropping valid traffic when the "point to point" interface is actually a 1:many tun interface, e.g. with the wireguard userland stack. Even in the case of true point-to-point links, this logic only avoids transient looping of packets sent by misconfigured applications or attackers, which can be subverted by proper route configuration rather than hardcoded logic in the kernel to drop packets. In the review, melifaro goes on to note that the kernel can't fix it, so it perhaps shouldn't try to be 'smart' about it. Additionally, that TTL will still kick in even with incorrect route configuration. PR: 247718 Reviewed by: melifaro, rgrimes MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25567 Notes: svn path=/head/; revision=364982
* Move net/route/shared.h definitions to net/route/route_var.h.Alexander V. Chernikov2020-08-282-2/+0
| | | | | | | | | | | | | | | | No functional changes. net/route/shared.h was created in the inital phases of nexthop conversion. It was intended to serve the same purpose as route_var.h - share definitions of functions and structures between the routing subsystem components. At that time route_var.h was included by many files external to the routing subsystem, which largerly defeats its purpose. As currently this is not the case anymore and amount of route_var.h includes is roughly the same as shared.h, retire the latter in favour of the former. Notes: svn path=/head/; revision=364941
* Make net.inet6.ip6.deembed_scopeid behaviour default & remove sysctl.Alexander V. Chernikov2020-08-153-9/+1
| | | | | | | | Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25637 Notes: svn path=/head/; revision=364250
* Simplify dom_<rtattach|rtdetach>.Alexander V. Chernikov2020-08-143-21/+11
| | | | | | | | | | | | | | Remove unused arguments from dom_rtattach/dom_rtdetach functions and make them return/accept 'struct rib_head' instead of 'void **'. Declare inet/inet6 implementations in the relevant _var.h headers similar to domifattach / domifdetach. Add rib_subscribe_internal() function to accept subscriptions to the rnh directly. Differential Revision: https://reviews.freebsd.org/D26053 Notes: svn path=/head/; revision=364238
* Use a static initializer for the multicast free tasks.Hans Petter Selasky2020-08-111-8/+1
| | | | | | | | | | | This makes the SYSINIT() function updated in r364072 superfluous. Suggested by: glebius@ MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=364102
* Fix rib_subscribe() waitok flag by performing allocation outside epoch.Alexander V. Chernikov2020-08-111-7/+4
| | | | | | | | | | Make in6_inithead() use rib_subscribe with waitok to achieve reliable subscription allocation. Reviewed by: glebius Notes: svn path=/head/; revision=364099
* MC: add a note with reference to the discussion and history as-to why weBjoern A. Zeeb2020-08-101-0/+1
| | | | | | | | | | are where we are now. The main thing is to try to get rid of the delayed freeing to avoid blocking on the taskq when shutting down vnets. X-Timeout: if you still see this before 14-RELEASE remove it. Notes: svn path=/head/; revision=364075
* Make sure the multicast release tasks are properly drained whenHans Petter Selasky2020-08-103-3/+11
| | | | | | | | | | | | | | | | destroying a VNET or a network interface. Else the inm release tasks, both IPv4 and IPv6 may cause a panic accessing a freed VNET or network interface. Reviewed by: jmg@ Discussed with: bz@ Differential Revision: https://reviews.freebsd.org/D24914 MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=364073
* Use proper prototype for SYSINIT() functions.Hans Petter Selasky2020-08-101-1/+1
| | | | | | | | | | | Mark the unused argument using the __unused macro. Discussed with: kib@ MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=364072
* IPV6_PKTINFO support for v4-mapped IPv6 socketsBjoern A. Zeeb2020-08-071-1/+1
| | | | | | | | | | | | | | | | When using v4-mapped IPv6 sockets with IPV6_PKTINFO we do not respect the given v4-mapped src address on the IPv4 socket. Implement the needed functionality. This allows single-socket UDP applications (such as OpenVPN) to work better on FreeBSD. Requested by: Gert Doering (gert greenie.net), pfsense Tested by: Gert Doering (gert greenie.net) Reviewed by: melifaro MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24135 Notes: svn path=/head/; revision=364018
* Fix typo.Andrey V. Elsukov2020-08-051-1/+1
| | | | | | | | | Submitted by: Evgeniy Khramtsov <evgeniy at khramtsov org> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25932 Notes: svn path=/head/; revision=363900
* Remove an incorrect assertion from in6p_lookup_mcast_ifp().Mark Johnston2020-08-041-9/+5
| | | | | | | | | | | | | | | | The socket may be bound to an IPv4-mapped IPv6 address. However, the inp address is not relevant to the JOIN_GROUP or LEAVE_GROUP operations. While here remove an unnecessary check for inp == NULL. Reported by: syzbot+d01ab3d5e6c1516a393c@syzkaller.appspotmail.com Reviewed by: hselasky MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25888 Notes: svn path=/head/; revision=363841
* ip6_output(): Check the return value of in6_getlinkifnet().Mark Johnston2020-07-301-0/+4
| | | | | | | | | | | | | | | If the destination address has an embedded scope ID, make sure that it corresponds to a valid ifnet before proceeding. Otherwise a sendto() with a bogus link-local address can trigger a NULL pointer dereference. Reported by: syzkaller Reviewed by: ae Fixes: r358572 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25887 Notes: svn path=/head/; revision=363710
* Transition from rtrequest1_fib() to rib_action().Alexander V. Chernikov2020-07-214-37/+48
| | | | | | | | | | | | Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546 Notes: svn path=/head/; revision=363403
* Temporarly revert r363319 to unbreak the build.Alexander V. Chernikov2020-07-194-48/+37
| | | | | | | | Reported by: CI Pointy hat to: melifaro Notes: svn path=/head/; revision=363320
* Transition from rtrequest1_fib() to rib_action().Alexander V. Chernikov2020-07-194-37/+48
| | | | | | | | | | | | Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546 Notes: svn path=/head/; revision=363319
* Switch inet6 default route subscription to the new rib subscription api.Alexander V. Chernikov2020-07-124-29/+20
| | | | | | | | | | | Old subscription model allowed only single customer. Switch inet6 to the new subscription api and eliminate the old model. Differential Revision: https://reviews.freebsd.org/D25615 Notes: svn path=/head/; revision=363128
* Fix IPv6 regression introduced by r362900.Alexander V. Chernikov2020-07-031-1/+1
| | | | | | | PR: kern/247729 Notes: svn path=/head/; revision=362909
* Complete conversions from fib<4|6>_lookup_nh_<basic|ext> to fib<4|6>_lookup().Alexander V. Chernikov2020-07-026-226/+23
| | | | | | | | | | | | | fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. With no callers remaining, remove fib[46]_lookup_nh_ functions. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25445 Notes: svn path=/head/; revision=362900
* Add the SCTP_SUPPORT kernel option.Mark Johnston2020-06-183-7/+7
| | | | | | | | | | | | | This is in preparation for enabling a loadable SCTP stack. Analogous to IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured in order to support a loadable SCTP implementation. Discussed with: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=362338
* Retire SCTP_SO_LOCK_TESTING.Michael Tuexen2020-06-071-14/+0
| | | | | | | | | | | | This was intended to test the locking used in the MacOS X kernel on a FreeBSD system, to make use of WITNESS and other debugging infrastructure. This hasn't been used for ages, to take it out to reduce the #ifdef complexity. MFC after: 1 week Notes: svn path=/head/; revision=361895
* Fix typo in previous commitRyan Moeller2020-06-031-1/+1
| | | | | | | | | | | Applied the wrong patch Reported by: Michael Butler <imb@protected-networks.net> Approved by: mav (mentor) Sponsored by: iXsystems.com Notes: svn path=/head/; revision=361757
* scope6: Check for NULL afdata before dereferencingRyan Moeller2020-06-031-0/+4
| | | | | | | | | | | | Narrows the race window with if_detach. Approved by: mav (mentor) MFC after: 3 days Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25017 Notes: svn path=/head/; revision=361756
* * Add rib_<add|del|change>_route() functions to manipulate the routing table.Alexander V. Chernikov2020-06-013-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067 Notes: svn path=/head/; revision=361706
* Revert r361704, it accidentally committed merged D25067 and D25070.Alexander V. Chernikov2020-06-013-3/+0
| | | | Notes: svn path=/head/; revision=361705
* * Add rib_<add|del|change>_route() functions to manipulate the routing table.Alexander V. Chernikov2020-06-013-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067 Notes: svn path=/head/; revision=361704
* Use fib[46]_lookup() in mtu calculations.Alexander V. Chernikov2020-05-281-0/+2
| | | | | | | | | | | | | | | fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code. Differential Revision: https://reviews.freebsd.org/D24974 Notes: svn path=/head/; revision=361576
* Replace ip6_ouput fib6_lookup_nh_<ext|basic> calls with fib6_lookup().Alexander V. Chernikov2020-05-281-23/+23
| | | | | | | | | | | | | | | | fib6_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24973 Notes: svn path=/head/; revision=361573
* Switch gif(4) path verification to fib[46]_check_urfp().Alexander V. Chernikov2020-05-281-7/+3
| | | | | | | | | | | | | | fibX_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Use specialized fib[46]_check_urpf() from newer KPI instead, to allow removal of older KPI. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24978 Notes: svn path=/head/; revision=361572
* Move <add|del|change>_route() functions to route_ctl.c in preparation ofAlexander V. Chernikov2020-05-231-1/+0
| | | | | | | | | | | | | | | multipath control plane changed described in D24141. Currently route.c contains core routing init/teardown functions, route table manipulation functions and various helper functions, resulting in >2KLOC file in total. This change moves most of the route table manipulation parts to a dedicated file, simplifying planned multipath changes and making route.c more manageable. Differential Revision: https://reviews.freebsd.org/D24870 Notes: svn path=/head/; revision=361421
* Use epoch(9) for rtentries to simplify control plane operations.Alexander V. Chernikov2020-05-232-15/+18
| | | | | | | | | | | | | | | | | | Currently the only reason of refcounting rtentries is the need to report the rtable operation details immediately after the execution. Delaying rtentry reclamation allows to stop refcounting and simplify the code. Additionally, this change allows to reimplement rib_lookup_info(), which is used by some of the customers to get the matching prefix along with nexthops, in more efficient way. The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to nhop_priv to be able to reliably set curvnet even during vnet teardown. Rest of the reference counting code will be removed in the D24867 . Differential Revision: https://reviews.freebsd.org/D24866 Notes: svn path=/head/; revision=361409
* Allow TCP to reuse local port with different destinationsMike Karels2020-05-182-10/+25
| | | | | | | | | | | | | | | | Previously, tcp_connect() would bind a local port before connecting, forcing the local port to be unique across all outgoing TCP connections for the address family. Instead, choose a local port after selecting the destination and the local address, requiring only that the tuple is unique and does not match a wildcard binding. Reviewed by: tuexen (rscheff, rrs previous version) MFC after: 1 month Sponsored by: Forcepoint LLC Differential Revision: https://reviews.freebsd.org/D24781 Notes: svn path=/head/; revision=361228
* IPv6: Fix a panic in the nd6 code with unmapped mbufs.Andrew Gallatin2020-05-121-3/+21
| | | | | | | | | | | | | | | | | | If the neighbor entry for an IPv6 TCP session using unmapped mbufs times out, IPv6 will send an icmp6 dest. unreachable message. In doing this, it will try to do a software checksum on the reflected packet. If this is a TCP session using unmapped mbufs, then there will be a kernel panic. To fix this, just free packets with unmapped mbufs, rather than sending the icmp. Reviewed by: np, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24821 Notes: svn path=/head/; revision=360982