src - FreeBSD source tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	netinet: Make in_canforward() return bool	Zhenlei Huang	2025-03-02	1	-1/+1
\| \| \| \| \| \|	No functional change intended. MFC after: 5 days
*	netinet: use in_broadcast() inline	Gleb Smirnoff	2025-02-22	1	-3/+1
\| \| \| \| \| \| \|	There should be no functional change. Reviewed by: rrs, markj Differential Revision: https://reviews.freebsd.org/D49088
*	ip: Defer checks for an unspecified dstaddr until after pfil hooks	Mark Johnston	2025-01-16	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To comply with Common Criteria certification requirements, it may be necessary to ensure that packets to 0.0.0.0/::0 are dropped and logged by the system firewall. Currently, such packets are dropped by ip_input() and ip6_input() before reaching pfil hooks; let's defer the checks slightly to give firewalls a chance to drop the packets themselves, as this gives better observability. Add some regression tests for this with pf+pflog. Note that prior to commit 713264f6b8b, v4 packets to the unspecified address were not dropped by the IP stack at all. Note that ip_forward() and ip6_forward() ensure that such packets are not forwarded; they are passed back unmodified. Add a regression test which ensures that such packets are visible to pflog. Reviewed by: glebius MFC after: 3 weeks Sponsored by: Klara, Inc. Sponsored by: OPNsense Differential Revision: https://reviews.freebsd.org/D48163
*	netinet: handle blackhole routes	Kristof Provost	2024-11-20	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If during ip_forward() we find a blackhole (or reject) route we should stop processing and count this in the 'cantforward' counter, just like we already do for IPv6. Blackhole routes are set to use the loopback interface, so we don't actually incorrectly forward traffic, but we do fail to count it as unroutable. Test this, both for IPv4 and IPv6. Reviewed by: melifaro Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D47529
*	pfil: PFIL_PASS never frees the mbuf	Kristof Provost	2024-01-29	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed to them. (E.g. when rejecting a packet, or when gathering up packets for reassembly). If the hook returns PFIL_PASS the mbuf must still be present. Assert this in pfil_mem_common() and ensure that ipfilter follows this convention. pf and ipfw already did. Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf must have been freed (or now be owned by the firewall for further processing, like packet scheduling or reassembly). This allows us to remove a few extraneous NULL checks. Suggested by: tuexen Reviewed by: tuexen, zlei Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D43617
*	sys: Remove ancient SCCS tags.	Warner Losh	2023-11-27	1	-2/+0
\| \| \| \| \| \| \| \|	Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
*	sys: Remove $FreeBSD$: one-line .c pattern	Warner Losh	2023-08-16	1	-2/+0
\| \| \| \|	Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
*	netinet*: Fix redirects for connections from localhost	Doug Rabson	2023-05-31	1	-1/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Redirect rules use PFIL_IN and PFIL_OUT events to allow packet filter rules to change the destination address and port for a connection. Typically, the rule triggers on an input event when a packet is received by a router and the destination address and/or port is changed to implement the redirect. When a reply packet on this connection is output to the network, the rule triggers again, reversing the modification. When the connection is initiated on the same host as the packet filter, it is initially output via lo0 which queues it for input processing. This causes an input event on the lo0 interface, allowing redirect processing to rewrite the destination and create state for the connection. However, when the reply is received, no corresponding output event is generated; instead, the packet is delivered to the higher level protocol (e.g. tcp or udp) without reversing the redirect, the reply is not matched to the connection and the packet is dropped (for tcp, a connection reset is also sent). This commit fixes the problem by adding a second packet filter call in the input path. The second call happens right before the handoff to higher level processing and provides the missing output event to allow the redirect's reply processing to perform its rewrite. This extra processing is disabled by default and can be enabled using pfilctl: pfilctl link -o pf:default-out inet-local pfilctl link -o pf:default-out6 inet6-local PR: 268717 Reviewed-by: kp, melifaro MFC-after: 2 weeks Differential Revision: https://reviews.freebsd.org/D40256
*	netinet: Tighten checks for unspecified source addresses	Mark Johnston	2023-03-06	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The assertions added in commit b0ccf53f2455 ("inpcb: Assert against wildcard addrs in in_pcblookup_hash_locked()") revealed that protocol layers may pass the unspecified address to in_pcblookup(). Add some checks to filter out such packets before we attempt an inpcb lookup: - Disallow the use of an unspecified source address in in_pcbladdr() and in6_pcbladdr(). - Disallow IP packets with an unspecified destination address. - Disallow TCP packets with an unspecified source address, and add an assertion to verify the comment claiming that the case of an unspecified destination address is handled by the IP layer. Reported by: syzbot+9ca890fb84e984e82df2@syzkaller.appspotmail.com Reported by: syzbot+ae873c71d3c71d5f41cb@syzkaller.appspotmail.com Reported by: syzbot+e3e689aba1d442905067@syzkaller.appspotmail.com Reviewed by: glebius, melifaro MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D38570
*	IfAPI: Explicitly include <net/if_private.h> in netstack	Justin Hibbits	2023-01-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Summary: In preparation of making if_t completely opaque outside of the netstack, explicitly include the header. <net/if_var.h> will stop including the header in the future. Sponsored by: Juniper Networks, Inc. Reviewed by: glebius, melifaro Differential Revision: https://reviews.freebsd.org/D38200
*	Unbreak builds having SCTP support compiled in	Michael Tuexen	2022-11-07	1	-0/+1
\| \| \| \| \|	Including sctp_var.h requires INET to be defined if IPv4 support is needed.
*	netinet*: remove PRC_ constants and streamline ICMP processing	Gleb Smirnoff	2022-10-04	1	-17/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the original design of the network stack from the protocol control input method pr_ctlinput was used notify the protocols about two very different kinds of events: internal system events and receival of an ICMP messages from outside. These events were coded with PRC_ codes. Today these methods are removed from the protosw(9) and are isolated to IPv4 and IPv6 stacks and are called only from icmp*_input(). The PRC_ codes now just create a shim layer between ICMP codes and errors or actions taken by protocols. - Change ipproto_ctlinput_t to pass just pointer to ICMP header. This allows protocols to not deduct it from the internal IP header. - Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer. It has all the information needed to the protocols. In the structure, change ip6c_finaldst fields to sockaddr_in6. The reason is that icmp6_input() already has this address wrapped in sockaddr, and the protocols want this address as sockaddr. - For UDP tunneling control input, as well as for IPSEC control input, change the prototypes to accept a transparent union of either ICMP header pointer or struct ip6ctlparam pointer. - In icmp_input() and icmp6_input() do only validation of ICMP header and count bad packets. The translation of ICMP codes to errors/actions is done by protocols. - Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap, inet6ctlerrmap arrays. - In protocol ctlinput methods either trust what icmp_errmap() recommend, or do our own logic based on the ICMP header. Differential revision: https://reviews.freebsd.org/D36731
*	netinet*: use sparse C99 initializer for inetctlerrmap	Gleb Smirnoff	2022-10-04	1	-6/+14
\| \| \| \| \| \|	and mark those PRC_* codes, that are used. The rest are dead code. This is not a functional change, but illustrative to make easier review of following changes.
*	net: employ newly added pfil_mbuf_{in,out} where approriate	Mateusz Guzik	2022-09-08	1	-1/+1
\| \| \| \| \| \|	Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D36454
*	net: remove stale altq_input reference	Mateusz Guzik	2022-09-07	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Code setting it was removed in: commit 325fab802e1f40c992141f945d0788c0edfdb1a4 Author: Eric van Gyzen <vangyzen@FreeBSD.org> Date: Tue Dec 4 23:46:43 2018 +0000 altq: remove ALTQ3_COMPAT code Reviewed by: glebius, kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D36471
*	raw ip: fix regression with multicast and RSVP	Gleb Smirnoff	2022-09-02	1	-4/+0
\| \| \| \| \| \| \| \| \| \|	With 61f7427f02a raw sockets protosw has wildcard pr_protocol. Protocol of a specific pcb is stored in inp_ip_p. Reviewed by: karels Reported by: karels Differential revision: https://reviews.freebsd.org/D36429 Fixes: 61f7427f02a307d28af674a12c45dd546e3898e4
*	ip_reass: separate ipreass_init() into global and VIMAGE parts	Gleb Smirnoff	2022-08-17	1	-1/+4
\| \| \| \|	Should have been done in 89128ff3e42.
*	protosw: retire pr_drain and use EVENTHANDLER(9) directly	Gleb Smirnoff	2022-08-17	1	-15/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The method was called for two different conditions: 1) the VM layer is low on pages or 2) one of UMA zones of mbuf allocator exhausted. This change 2) into a new event handler, but all affected network subsystems modified to subscribe to both, so this change shall not bring functional changes under different low memory situations. There were three subsystems still using pr_drain: TCP, SCTP and frag6. The latter had its protosw entry for the only reason to register its pr_drain method. Reviewed by: tuexen, melifaro Differential revision: https://reviews.freebsd.org/D36164
*	ip_reass: use callout(9) directly instead of pr_slowtimo	Gleb Smirnoff	2022-08-17	1	-20/+0
\| \| \| \| \|	Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36236
*	protosw: separate pr_input and pr_ctlinput out of protosw	Gleb Smirnoff	2022-08-17	1	-63/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The protosw KPI historically has implemented two quite orthogonal things: protocols that implement a certain kind of socket, and protocols that are IPv4/IPv6 protocol. These two things do not make one-to-one correspondence. The pr_input and pr_ctlinput methods were utilized only in IP protocols. This strange duality required IP protocols that doesn't have a socket to declare protosw, e.g. carp(4). On the other hand developers of socket protocols thought that they need to define pr_input/pr_ctlinput always, which lead to strange dead code, e.g. div_input() or sdp_ctlinput(). With this change pr_input and pr_ctlinput as part of protosw disappear and IPv4/IPv6 get their private single level protocol switch table ip_protox[] and ip6_protox[] respectively, pointing at array of ipproto_input_t functions. The pr_ctlinput that was used for control input coming from the network (ICMP, ICMPv6) is now represented by ip_ctlprotox[] and ip6_ctlprotox[]. ipproto_register() becomes the only official way to register in the table. Those protocols that were always static and unlikely anybody is interested in making them loadable, are now registered by ip_init(), ip6_init(). An IP protocol that considers itself unloadable shall register itself within its own private SYSINIT(). Reviewed by: tuexen, melifaro Differential revision: https://reviews.freebsd.org/D36157
*	sockbuf: merge two versions of sbcreatecontrol() into one	Gleb Smirnoff	2022-05-17	1	-28/+28
\| \| \| \|	No functional change.
*	ip_mroute: refactor epoch-basd locking	Wojciech Macek	2022-02-02	1	-6/+0
\| \| \| \| \| \| \| \| \| \|	Remove duplicated epoch_enter and epoch_exit in IP inp/outp routines. Remove unnecessary macros as well. Obtained from: Semihalf Spponsored by: Stormshield Reviewed by: glebius Differential revision: https://reviews.freebsd.org/D34030
*	protocols: init with standard SYSINIT(9) or VNET_SYSINIT	Gleb Smirnoff	2022-01-03	1	-13/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The historical BSD network stack loop that rolls over domains and over protocols has no advantages over more modern SYSINIT(9). While doing the sweep, split global and per-VNET initializers. Getting rid of pr_init allows to achieve several things: o Get rid of ifdef's that protect against double foo_init() when both INET and INET6 are compiled in. o Isolate initializers statically to the module they init. o Makes code easier to understand and maintain. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33537
*	IPv4: fix redirect sending conditions	Bjoern A. Zeeb	2021-12-26	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RFC792,1009,1122 state the original conditions for sending a redirect. RFC1812 further refine these. ip_forward() still sepcifies the checks originally implemented for these (we do slightly more/different than suggested as makes sense). The implementation added in 8ad114c082a159c0dde95aa35d2e3e108aa30a75 to ip_tryforward() however is flawed and may send a "multi-hop" redirects (to a host not on the directly connected network). Do proper checks in ip_tryforward() to stop us from sending redirects in situations we may not. Keep as much logic out of ip_tryforward() and in ip_redir_alloc() and only do the mbuf copy once we are sure we will send a redirect. While here enhance and fix comments as to which conditions are handled for sending redirects in various places. Reported by: pi (on net@ 2021-12-04) MFC after: 3 days Sponsored by: Dr.-Ing. Nepustil & Co. GmbH Reviewed by: cy, others (earlier versions) Differential Revision: https://reviews.freebsd.org/D33274
*	Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"	Cy Schubert	2021-12-02	1	-2/+1
\| \| \| \| \| \| \| \|	This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.
*	wpa: Import wpa_supplicant/hostapd commit 14ab4a816	Cy Schubert	2021-12-02	1	-1/+2
\| \| \| \| \| \|	This is the November update to vendor/wpa committed upstream 2021-11-26. MFC after: 1 month
*	ip_input: remove pointless check in INP_RECVIF handling	Gleb Smirnoff	2021-12-02	1	-2/+1
\| \| \| \| \|	An mbuf rcvif pointer is supposed to be valid and doesn't need extra checks. The code appeared in d314ad7b73639.
*	Add net.inet.ip.source_address_validation	Gleb Smirnoff	2021-11-12	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \|	Drop packets arriving from the network that have our source IP address. If maliciously crafted they can create evil effects like an RST exchange between two of our listening TCP ports. Such packets just can't be legitimate. Enable the tunable by default. Long time due for a modern Internet host. Reviewed by: donner, melifaro Differential revision: https://reviews.freebsd.org/D32914
*	ip_input: packet filters shall not modify m_pkthdr.rcvif	Gleb Smirnoff	2021-11-12	1	-3/+2
\| \| \| \| \| \| \| \| \|	Quick review confirms that they do not, also IPv6 doesn't expect such a change in mbuf. In IPv4 this appeared in 0aade26e6d061, which doesn't seem to have a valid explanation why. Reviewed by: donner, kp, melifaro Differential revision: https://reviews.freebsd.org/D32913
*	Rename net.inet.ip.check_interface to rfc1122_strong_es and document it.	Gleb Smirnoff	2021-11-12	1	-43/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This very questionable feature was enabled in FreeBSD for a very short time. It was disabled very soon upon merging to RELENG_4 - 23d7f14119bf. And in HEAD was also disabled pretty soon - 4bc37f9836fb1. The tunable has very vague name. Check interface for what? Given that it was never documented and almost never enabled, I think it is fine to rename it together with documenting it. Also, count packets dropped by this tunable as ips_badaddr, otherwise they fall down to ips_cantforward counter, which is misleading, as packet was not supposed to be forwarded, it was destined locally. Reviewed by: donner, kp Differential revision: https://reviews.freebsd.org/D32912
*	net: sprinkle __predict_false in ip_input on error conditions	Mateusz Guzik	2021-11-12	1	-12/+15
\| \| \| \| \| \| \| \| \| \|	While here rearrange the RVSP check to inspect proto first and avoid evaluating V_rsvp in the common case to begin with (most notably avoid the expensive read). Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32929
*	Use network epoch to protect local IPv4 addresses hash.	Gleb Smirnoff	2021-10-22	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	The modification to the hash are already naturally locked by in_control_sx. Convert the hash lists to CK lists. Remove the in_ifaddr_rmlock. Assert the network epoch where necessary. Most cases when the hash lookup is done the epoch is already entered. Cover a few cases, that need entering the epoch, which mostly is initial configuration of tunnel interfaces and multicast addresses. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D32584
*	routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549).	Zhenlei Huang	2021-08-22	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement kernel support for RFC 5549/8950. * Relax control plane restrictions and allow specifying IPv6 gateways for IPv4 routes. This behavior is controlled by the net.route.rib_route_ipv6_nexthop sysctl (on by default). * Always pass final destination in ro->ro_dst in ip_forward(). * Use ro->ro_dst to exract packet family inside if_output() routines. Consistently use RO_GET_FAMILY() macro to handle ro=NULL case. * Pass extracted family to nd6_resolve() to get the LLE with proper encap. It leverages recent lltable changes committed in c541bd368f86. Presence of the functionality can be checked using ipv4_rfc5549_support feature(3). Example usage: route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0 Differential Revision: https://reviews.freebsd.org/D30398 MFC after: 2 weeks
*	ip_forward: Restore RFC reference	Zhenlei Huang	2021-05-22	1	-2/+5
\| \| \| \| \| \| \| \| \|	Add RFC reference lost in 3d846e48227e2e78c1e7b35145f57353ffda56ba PR: 255388 Reviewed By: rgrimes, donner, karels, marcus, emaste MFC after: 27 days Differential Revision: https://reviews.freebsd.org/D30374
*	Do not forward datagrams originated by link-local addresses	Zhenlei Huang	2021-05-18	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	The current implement of ip_input() reject packets destined for 169.254.0.0/16, but not those original from 169.254.0.0/16 link-local addresses. Fix to fully respect RFC 3927 section 2.7. PR: 255388 Reviewed by: donner, rgrimes, karels MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D29968
*	mroute: fix race condition during mrouter shutting down	Wojciech Macek	2021-05-11	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a race condition between V_ip_mrouter de-init and ip_mforward handling. It might happen that mrouted is cleaned up after V_ip_mrouter check and before processing packet in ip_mforward. Use epoch call aproach, similar to IPSec which also handles such case. Reported by: Damien Deville Obtained from: Stormshield Reviewed by: mw Differential Revision: https://reviews.freebsd.org/D29946
*	Flush remaining routes from the routing table during VNET shutdown.	Alexander V. Chernikov	2021-03-10	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes rtentry leak for the cloned interfaces created inside the VNET. PR: 253998 Reported by: rashey at superbox.pl MFC after: 3 days Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown). Thus, any route table operations are too late to schedule. As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`. It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish. Test Plan: ``` set_skip:set_skip_group_lo -> passed [0.053s] tail -n 200 /var/log/messages \| grep rtentry ``` Reviewers: #network, kp, bz Reviewed By: kp Subscribers: imp, ae Differential Revision: https://reviews.freebsd.org/D29116
*	Remove RADIX_MPATH config option.	Alexander V. Chernikov	2020-11-29	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \|	ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D27244 Notes: svn path=/head/; revision=368164
*	ip_fastfwd: style(9) tidy for r367628	Ed Maste	2020-11-13	1	-1/+1
\| \| \| \| \| \| \| \|	Discussed with: gnn MFC with: r367628 Notes: svn path=/head/; revision=367645
*	An earlier commit effectively turned out the fast forwading path	George V. Neville-Neil	2020-11-12	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \|	due to its lack of support for ICMP redirects. The following commit adds redirects to the fastforward path, again allowing for decent forwarding performance in the kernel. Reviewed by: ae, melifaro Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate") Notes: svn path=/head/; revision=367628
*	net: clean up empty lines in .c and .h files	Mateusz Guzik	2020-09-01	1	-2/+0
\| \| \| \|	Notes: svn path=/head/; revision=365071
*	Fix a possible next-hop refcount leak when handling IPSec traffic.	Mark Johnston	2020-07-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It may be possible to fix this by deferring the lookup, but let's keep the initial change simple to make MFCs easier. PR: 246951 Reviewed by: melifaro MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25519 Notes: svn path=/head/; revision=362840
*	Convert rtalloc_mpath_fib() users to the new KPI.	Alexander V. Chernikov	2020-04-28	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	New fib[46]_lookup() functions support multipath transparently. Given that, switch the last rtalloc_mpath_fib() calls to dib4_lookup() and eliminate the function itself. Note: proper flowid generation (especially for the outbound traffic) is a bigger topic and will be handled in a separate review. This change leaves flowid generation intact. Differential Revision: https://reviews.freebsd.org/D24595 Notes: svn path=/head/; revision=360431
*	Convert route caching to nexthop caching.	Alexander V. Chernikov	2020-04-25	1	-15/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change is build on top of nexthop objects introduced in r359823. Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Splitting the LPM code and the attached data solves multiple long-standing problems in the routing layer, drastically reduces the coupling with outher parts of the stack and allows to transparently introduce faster lookup algorithms. Route caching was (re)introduced to minimise (slow) routing lookups, allowing for notably better performance for large TCP senders. Caching works by acquiring rtentry reference, which is protected by per-rtentry mutex. If the routing table is changed (checked by comparing the rtable generation id) or link goes down, cache record gets withdrawn. Nexthops have the same reference counting interface, backed by refcount(9). This change merely replaces rtentry with the actual forwarding nextop as a cached object, which is mostly mechanical. Other moving parts like cache cleanup on rtable change remains the same. Differential Revision: https://reviews.freebsd.org/D24340 Notes: svn path=/head/; revision=360292
*	sys/netinet: remove spurious doubled ;s	Ed Maste	2020-03-27	1	-1/+1
\| \| \| \|	Notes: svn path=/head/; revision=359381
*	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)	Pawel Biernacki	2020-02-26	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
*	White space cleanup -- remove trailing tab's or spaces	Randall Stewart	2020-02-12	1	-7/+7
\| \| \| \| \| \| \| \| \|	from any line. Sponsored by: Netflix Inc. Notes: svn path=/head/; revision=357818
*	Widen NET_EPOCH coverage.	Gleb Smirnoff	2019-10-07	1	-14/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111 Notes: svn path=/head/; revision=353292
*	Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code	Rodney W. Grimes	2019-04-04	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a few places that use hand crafted versions of the macros from sys/netinet/in.h making it difficult to actually alter the values in use by these macros. Correct that by replacing handcrafted code with proper macro usage. Reviewed by: karels, kristof Approved by: bde (mentor) MFC after: 3 weeks Sponsored by: John Gilmore Differential Revision: https://reviews.freebsd.org/D19317 Notes: svn path=/head/; revision=345888
*	New pfil(9) KPI together with newborn pfil API and control utility.	Gleb Smirnoff	2019-01-31	1	-13/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses. In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together. New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters. Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points. Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated. Differential Revision: https://reviews.freebsd.org/D18951 Notes: svn path=/head/; revision=343631