aboutsummaryrefslogtreecommitdiff
path: root/sys/net/if_ethersubr.c
Commit message (Collapse)AuthorAgeFilesLines
* ethernet: Move the assertion of ether header sizes back into ethernet.hZhenlei Huang2025-07-151-5/+0
| | | | | | | | | | | | | | | | | | | | | | There're lots of consumers, Ethernet drivers, libraries and applications. It is more promising to assert the sizes in every compilation units rather than in if_ethersubr.c only. Those assertions were in the header file but were moved to if_ethersubr.c due to possible conflict of the implementation of CTASSERT [1]. Now that the default C standard is now gnu17 [2] [3] which supports _Static_assert natively, use _Static_assert instead of CTASSERT to avoid possible conflicts. While here, add an extra assertion for the size of struct ether_vlan_header. [1] d54d93ac7f0f Move CTASSERT of ether header sizes out of the header file and into ... [2] ca4eddea97c5 src: Use gnu17 as the default C standard for userland instead of gnu99 [3] 3a98d5701c7f sys: Use gnu17 as the default C standard for the kernel PR: 287761 (exp-run) Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D50947
* net: ether_gen_addr: fix address generationMaxim Shalomikhin2025-07-021-3/+1
| | | | | | | | | | | | | | | | | Some errors in ether_gen_addr() caused us to generate MAC addresses out of range, and the ones that were within range had other errors causing the pool of addresses that we might actually generate to shrink. Fix both prblems by using only two bytes of the digest and then OR'ing against the mask, which has the appropriate byte set for the fourth octet of the range already; essentially, our digest is only contributing the last two octets. Change is the author, but any blame for the commit message goes to kevans. PR: 256850 Relnotes: yes
* if_ethersubr: preserve entropy of MAC addressesQuentin Thébault2025-07-021-1/+1
| | | | | | | | | | | | | | | | | | | Ethernet MAC addresses are currently generated by concatenating the first bytes of a SHA1 digest. However the digest buffer is defined as a signed char buffer, which means that any digest digit greater than 0x80 will be promoted to a negative int before the concatenation. As a result, any digest digit greater than 0x80 will overwrite the previous ones throught the application of the bitwise-or with its 0xFF higher bytes, effectively reducing the entropy of addresses generated and significantly increasing the risk of conflict. Defining the digest buffer as unsigned ensures there will be no unwanted consequences during integer promotion and the concatenation will work as expected. Signed-off-by: Quentin Thébault <quentin.thebault@defenso.fr> Closes: https://github.com/freebsd/freebsd-src/pull/1750
* if_vlan: add a prototype for vlan_input_pLexi Winter2025-06-251-2/+0
| | | | | | | | | | | | Move the definition of vlan_input_p to net/if.c and its prototype to if_vlan_var.h, to match the other functions exported from if_vlan. Remove the previous comment which is now outdated. This is required for if_bridge to use this function. Reviewed by: zlei, kp, des, kib Approved by: des (mentor) Differential Revision: https://reviews.freebsd.org/D50567
* ethernet: Set maximum Ethernet header length based on the capability ↵Zhenlei Huang2025-06-211-1/+2
| | | | | | | | | | | | IFCAP_VLAN_MTU Lots of Ethernet drivers fix the header length after ether_ifattach(). Well since the net stack can conclude it based on the capability IFCAP_VLAN_MTU, let the net stack do it rather than individual drivers. Reviewed by: markj, glebius MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D50846
* bridge: allow IP addresses on members to be disabledLexi Winter2025-05-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | add a new sysctl, net.link.bridge.member_ifaddrs, which defaults to 1. if it is set to 1, bridge behaviour is unchanged. if it is set to 0: - an interface which has AF_INET6 or AF_INET addresses assigned cannot be added to a bridge. - an interface in a bridge cannot have an AF_INET6 or AF_INET address assigned to it. - the bridge will no longer consider the lladdrs on bridge members to be local addresses, i.e. frames sent to member lladdrs will not be processed by the host. update bridge.4 to document this behaviour, as well as the existing recommendation that IP addresses should not be configured on bridge members anyway, even if it currently partially works. in testing, setting this to 0 on a bridge with 50 member interfaces improved throughput by 22% (4.61Gb/s -> 5.67Gb/s) across two member epairs due to eliding the bridge member list walk in GRAB_OUR_PACKETS. Reviewed by: kp, des Approved by: des (mentor) Differential Revision: https://reviews.freebsd.org/D49995
* bridge: store a bridge_iflist pointer in ifnetLexi Winter2025-04-091-0/+2
| | | | | | | | | | | | | | | | | | | currently, bridge stores a pointer to its softc in ifnet. this means that when processing a packet, it needs to walk the bridge's member list to find the bridge_iflist for the interface the packet came from. instead, store a pointer to the bridge_iflist in ifnet, and add a pointer from bridge_iflist back to the bridge_softc. this means given an ifnet, we always have both the softc and the bridge_iflist without a list walk. there are two places outside if_bridge that treat ifnet->if_bridge as something other than an opaque pointer: bridgestp, and netinet. add two function pointers exported from if_bridge to handle those cases. bump __FreeBSD_version as this is technically a KABI break. Reviewed by: kp
* sys/net: fix several sysinit_cfunc_t signature mismatchesSHENGYI HONG2025-01-161-2/+2
| | | | | Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D48490
* ethernet: Retire M_HASFCSZhenlei Huang2024-07-041-10/+0
| | | | | | | | | | | | | | | | | | | The mbuf flag M_HASFCS was introduced for drivers to indicate the net stack that packets include FCS (Frame Check Sequence). In principle, to be efficient, FCS should always be processed by hardware, firmware, or at last sort the driver. Well, Ethernet specifies that damaged frames should be discarded, thus only good ones will be passed up to the net stack, then it makes no senses for the net stack to see FCS just to trim it. The last consumer of the flag M_HASFCS has been removed since change [1]. It is time to retire it. 1. 105a4f7b3cb6 ng_atmllc: remove Reviewed by: kp MFC after: never Differential Revision: https://reviews.freebsd.org/D42391
* ethernet: Fix logging of frame lengthZhenlei Huang2024-04-081-3/+3
| | | | | | | | | | Both the mbuf length and the total packet length are signed. While here, update a stall comment to reflect the current practice. Reviewed by: kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D42390
* pfil: PFIL_PASS never frees the mbufKristof Provost2024-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed to them. (E.g. when rejecting a packet, or when gathering up packets for reassembly). If the hook returns PFIL_PASS the mbuf must still be present. Assert this in pfil_mem_common() and ensure that ipfilter follows this convention. pf and ipfw already did. Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf must have been freed (or now be owned by the firewall for further processing, like packet scheduling or reassembly). This allows us to remove a few extraneous NULL checks. Suggested by: tuexen Reviewed by: tuexen, zlei Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D43617
* Teach if_smsc to get MAC from bootargs.Ronald Klop2023-12-071-2/+8
| | | | | | | | | | | | | | | Some Raspberry Pi pass smsc95xx.macaddr=XX:XX:XX:XX:XX:XX as bootargs. Use this if no ethernet address is found in an EEPROM. As last resort fall back to ether_gen_addr() instead of random MAC. PR: 274092 Reported by: Patrick M. Hausen (via ML) Reviewed by: imp, karels, zlei Tested by: Patrick M. Hausen Approved by: karels MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D42463
* sys: Remove ancient SCCS tags.Warner Losh2023-11-271-2/+0
| | | | | | | | Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
* net: Check per-flow priority code point for untagged trafficZhenlei Huang2023-09-061-5/+16
| | | | | | | | | | | | | | | | | | | | | | Commit 868aabb4708d introduced per-flow priority. There's a defect in the logic for untagged traffic, it does not check M_VLANTAG set in the mbuf packet header or MTAG_8021Q/MTAG_8021Q_PCP_OUT tag set by firewall, then can result missing desired priority in the outbound packets. For mbuf packet with M_VLANTAG in header, some interfaces happen to work due to bug in the drivers mentioned in D39499. As modern interfaces have VLAN hardware offloading, the defect is barely noticeable unless the feature per-flow priority is widely tested. As a side effect of this defect, the soft padding to work around buggy bridges is bypassed. That may result in regression if soft padding is requested. PR: 273431 Discussed with: kib Fixes: 868aabb4708d Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39536
* net: Remove vlan metadata on pcp / vlan encapsulationZhenlei Huang2023-08-301-2/+5
| | | | | | | | | | | | | | | | | | | | | | | For oubound traffic, the flag M_VLANTAG is set in mbuf packet header to indicate the underlaying interface do hardware VLAN tag insertion if capable, otherwise the net stack will do 802.1Q encapsulation instead. Commit 868aabb4708d introduced per-flow priority which set the priority ID in the mbuf packet header. There's a corner case that when the driver is disabled to do hardware VLAN tag insertion, and the net stack do 802.1Q encapsulation, then it will result double tagged packets if the driver do not check the enabled capability (hardware VLAN tag insertion). Unfortunately some drivers, currently known cxgbe(4) re(4) ure(4) igc(4) and vmx(4), have this issue. From a quick review for other interface drivers I believe a lot more drivers have the same issue. It makes more sense to fix in net stack than to try to change every single driver. PR: 270736 Reviewed by: kp Fixes: 868aabb4708d Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39499
* net: Do not overwrite if_vlan's PCPZhenlei Huang2023-08-231-3/+4
| | | | | | | | | | | | | | | | | | | | | In commit c7cffd65c5d8 the function ether_8021q_frame() was slightly refactored to use pointer of struct ether_8021q_tag as parameter qtag to include the new option proto. It is wrong to write to qtag->pcp as it will effectively change the memory that qtag points to. Unfortunately the transmit routine of if_vlan parses pointer of the member ifv_qtag of its softc which stores vlan interface's PCP internally, when transmitting mbufs that contains PCP the vlan interface's PCP will get overwritten. Fix by operating on a local copy of qtag->pcp. Also mark 'struct ether_8021q_tag' as const so that compilers can pick up such kind of bug. PR: 273304 Reviewed by: kp Fixes: c7cffd65c5d85 Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D39505
* sys: Remove $FreeBSD$: one-line .h patternWarner Losh2023-08-161-1/+0
| | | | Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
* net: replace IFF_KNOWSEPOCH with IFF_NEEDSEPOCHGleb Smirnoff2023-04-171-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Expect that drivers call into the network stack with the net epoch entered. This has already been the fact since early 2020. The net interrupts, that are marked with INTR_TYPE_NET, were entering epoch since 511d1afb6bf. For the taskqueues there is NET_TASK_INIT() and all drivers that were known back in 2020 we marked with it in 6c3e93cb5a4. However in e87c4940156 we took conservative approach and preferred to opt-in rather than opt-out for the epoch. This change not only reverts e87c4940156 but adds a safety belt to avoid panicing with INVARIANTS if there is a missed driver. With INVARIANTS we will run in_epoch() check, print a warning and enter the net epoch. A driver that prints can be quickly fixed with the IFF_NEEDSEPOCH flag, but better be augmented to properly enter the epoch itself. Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the new IFF_NEEDSEPOCH. But the tcp_lro_flush_all() asserts the presence of network epoch. Indeed, all NIC drivers that support LRO already provide the epoch, either with help of INTR_TYPE_NET or just running NET_EPOCH_ENTER() in their code. Reviewed by: zlei, gallatin, erj Differential Revision: https://reviews.freebsd.org/D39510
* bridge: Add support for emulated netmap modeMark Johnston2023-04-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | if_bridge receives packets via a special interface, if_bridge_input, rather than by if_input. Thus, netmap's usual hooking of ifnet routines does not work as expected. Instead, modify bridge_input() to pass packets directly to netmap when it is enabled. This applies to both locally delivered packets and forwarded packets. When a netmap application transmits a packet by writing it to the host TX ring, the mbuf chain is passed to if_input, which ordinarily points to ether_input(). However, when transmitting via if_bridge, bridge_input() needs to see the packet again in order to decide whether to deliver or forward. Thus, introduce a new protocol flag, M_BRIDGE_INJECT, which 1) causes the packet to be passed to bridge_input() again after Ethernet processing, and 2) avoids passing the packet back to netmap. The source MAC address of the packet is used to determine the original "receiving" interface. Reviewed by: vmaffione MFC after: 2 months Sponsored by: Zenarmor Sponsored by: OPNsense Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38066
* net: use pfil_mbuf_{in,out} where we always have an mbufGleb Smirnoff2023-02-141-3/+2
| | | | | | | This finalizes what has been started in 0b70e3e78b0. Reviewed by: kp, mjg Differential revision: https://reviews.freebsd.org/D37976
* bpf: Add "_if" tap APIsJustin Hibbits2023-01-311-0/+12
| | | | | | | | | | Summary: Hide more netstack by making the BPF_TAP macros real functions in the netstack. "struct ifnet" is used in the header instead of "if_t" to keep header pollution down. Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38103
* ifnet/API: Move struct ifnet definition to a <net/if_private.h>Justin Hibbits2023-01-241-0/+1
| | | | | | | | | | | Hide the ifnet structure definition, no user serviceable parts inside, it's a netstack implementation detail. Include it temporarily in <net/if_var.h> until all drivers are updated to use the accessors exclusively. Reviewed by: glebius Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38046
* ether_demux: Defer stripping the Ethernet header.John Baldwin2022-11-301-7/+4
| | | | | | | | | This avoids having to undo it before invoking NetGraph's orphan input hook. Reviewed by: ae, melifaro Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D37510
* ether_resolve_addr: eh is only used for INET or INET6.John Baldwin2022-04-131-2/+1
|
* vlan: allow net.link.vlan.mtag_pcp to be set per vnetKristof Provost2022-02-141-4/+5
| | | | | | The primary reason for this change is to facilitate testing. MFC after: 1 week
* routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549).Zhenlei Huang2021-08-221-5/+6
| | | | | | | | | | | | | | | | | | | | | | | Implement kernel support for RFC 5549/8950. * Relax control plane restrictions and allow specifying IPv6 gateways for IPv4 routes. This behavior is controlled by the net.route.rib_route_ipv6_nexthop sysctl (on by default). * Always pass final destination in ro->ro_dst in ip_forward(). * Use ro->ro_dst to exract packet family inside if_output() routines. Consistently use RO_GET_FAMILY() macro to handle ro=NULL case. * Pass extracted family to nd6_resolve() to get the LLE with proper encap. It leverages recent lltable changes committed in c541bd368f86. Presence of the functionality can be checked using ipv4_rfc5549_support feature(3). Example usage: route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0 Differential Revision: https://reviews.freebsd.org/D30398 MFC after: 2 weeks
* lltable: Add support for "child" LLEs holding encap for IPv4oIPv6 entries.Alexander V. Chernikov2021-08-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we use pre-calculated headers inside LLE entries as prepend data for `if_output` functions. Using these headers allows saving some CPU cycles/memory accesses on the fast path. However, this approach makes adding L2 header for IPv4 traffic with IPv6 nexthops more complex, as it is not possible to store multiple pre-calculated headers inside lle. Additionally, the solution space is limited by the fact that PCB caching saves LLEs in addition to the nexthop. Thus, add support for creating special "child" LLEs for the purpose of holding custom family encaps and store mbufs pending resolution. To simplify handling of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE. Such LLEs are not visible when iterating LLE table. Their lifecycle is bound to the "parent" LLE - it is not possible to delete "child" when parent is alive. Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state machine used by the standard LLEs. nd6_lookup() and nd6_resolve() now accepts an additional argument, family, allowing to return such child LLEs. This change uses `LLE_SF()` macro which packs family and flags in a single int field. This is done to simplify merging back to stable/. Once this code lands, most of the cases will be converted to use a dedicated `family` parameter. Differential Revision: https://reviews.freebsd.org/D31379 MFC after: 2 weeks
* ether: Add a KMSAN check for transmitted framesMark Johnston2021-08-111-3/+8
| | | | | | This helps ensure that outbound packet data is initialized per KMSAN. Sponsored by: The FreeBSD Foundation
* [lltable] Unify datapath feedback mechamism.Alexander V. Chernikov2021-08-041-1/+1
| | | | | | | | | | | | | Use newly-create llentry_request_feedback(), llentry_mark_used() and llentry_get_hittime() to request datapatch usage check and fetch the results in the same fashion both in IPv4 and IPv6. While here, simplify llentry_provide_feedback() wrapper by eliminating 1 condition check. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D31390
* kern: ether_gen_addr: randomize on default hostuuid, tooKyle Evans2021-06-021-3/+14
| | | | | | | | | | | | | | | Currently, this will still hash the default (all zero) hostuuid and potentially arrive at a MAC address that has a high chance of collision if another interface of the same name appears in the same broadcast domain on another host without a hostuuid, e.g., some virtual machine setups. Instead of using the default hostuuid, just treat it as a failure and generate a random LA unicast MAC address. Reviewed by: bz, gbe, imp, kbowling, kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29788
* bridge: Remove members when assigned to a new vnetKristof Provost2021-02-231-3/+0
| | | | | | | | | | | | When the bridge is moved to a different vnet we must remove all of its member interfaces (and span interfaces), because we don't know if those will be moved along with it. We don't want to hold references to interfaces not in our vnet. Reviewed by: donner@ MFC after: 1 week Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D28859
* Remove not needed variable initialization.Hans Petter Selasky2020-12-231-2/+2
| | | | | | | | | And switch from int to bool while at it. Reviewed by: melifaro@ Differential Revision: https://reviews.freebsd.org/D27725 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking
* Add support for IP over infiniband, IPoIB, to lagg(4). Currently onlyHans Petter Selasky2020-10-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | the failover protocol is supported due to limitations in the IPoIB architecture. Refer to the lagg(4) manual page for how to configure and use this new feature. A new network interface type, IFT_INFINIBANDLAG, has been added, similar to the existing IFT_IEEE8023ADLAG . ifconfig(8) has been updated to accept a new laggtype argument when creating lagg(4) network interfaces. This new argument is used to distinguish between ethernet and infiniband type of lagg(4) network interface. The laggtype argument is optional and defaults to ethernet. The lagg(4) command line syntax is backwards compatible. Differential Revision: https://reviews.freebsd.org/D26254 Reviewed by: melifaro@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking Notes: svn path=/head/; revision=366933
* Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q).Alexander V. Chernikov2020-10-211-11/+19
| | | | | | | | | | | | | | | | | | | | | 802.1ad interfaces are created with ifconfig using the "vlanproto" parameter. Eg., the following creates a 802.1Q VLAN (id #42) over a 802.1ad S-VLAN (id #5) over a physical Ethernet interface (em0). ifconfig vlan5 create vlandev em0 vlan 5 vlanproto 802.1ad up ifconfig vlan42 create vlandev vlan5 vlan 42 inet 10.5.42.1/24 VLAN_MTU, VLAN_HWCSUM and VLAN_TSO capabilities should be properly supported. VLAN_HWTAGGING is only partially supported, as there is currently no IFCAP_VLAN_* denoting the possibility to set the VLAN EtherType to anything else than 0x8100 (802.1ad uses 0x88A8). Submitted by: Olivier Piras Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D26436 Notes: svn path=/head/; revision=366917
* Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow.Richard Scheffenegger2020-10-091-0/+7
| | | | | | | | | | | | | | | | | | This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt) option IP(V6)_VLAN_PCP, which can be set to -1 (interface default), or explicitly to any priority between 0 and 7. Note that for untagged traffic, explicitly adding a priority will insert a special 801.1Q vlan header with vlan ID = 0 to carry the priority setting Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26409 Notes: svn path=/head/; revision=366569
* net: clean up empty lines in .c and .h filesMateusz Guzik2020-09-011-6/+1
| | | | Notes: svn path=/head/; revision=365071
* Use devctl.h instead of bus.h to reduce newbus pollution.Warner Losh2020-08-211-1/+1
| | | | | | | | | | There's no need for these parts of the kernel to know about newbus, so narrow what is included to devctl.h for device_notify_*. Suggested by: kib@ Notes: svn path=/head/; revision=364442
* ether_ifattach: set mtu before calling if_attach()Kyle Evans2020-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | if_attach() -> if_attach_internal() will call if_attachdomain1(ifp) any time an ethernet interface is setup *after* SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_FIRST. This eventually leads to nd6_ifattach() -> nd6_setmtu0() stashing off ifp->if_mtu in ndi->maxmtu *before* ifp->if_mtu has been properly set in some scenarios, e.g., USB ethernet adapter plugged in later on. For interfaces that are created in early boot, we don't have this issue as domains aren't constructed enough for them to attach and thus it gets deferred to domainifattach at SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_SECOND *after* the mtu has been set earlier in ether_ifattach(). PR: 248005 Submitted by: Mathew <mjanelle blackberry com> MFC after: 1 week Notes: svn path=/head/; revision=363244
* ethersubr: Make the mac address generation more robustKristof Provost2020-04-181-7/+19
| | | | | | | | | | | | | | | If we create two (vnet) jails and create a bridge interface in each we end up with the same mac address on both bridge interfaces. These very often conflicts, resulting in same mac address in both jails. Mitigate this problem by including the jail name in the mac address. Reviewed by: kevans, melifaro MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24383 Notes: svn path=/head/; revision=360068
* Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)Pawel Biernacki2020-02-261-3/+5
| | | | | | | | | | | | | | | | | | | r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
* Although most of the NIC drivers are epoch ready, due to peer pressureGleb Smirnoff2020-02-241-2/+5
| | | | | | | | | | | | | | | | | | | switch over to opt-in instead of opt-out for epoch. Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch when processing its packets. Now this will create recursive entrance in epoch in >90% network drivers, but will guarantee safeness of the transition. Mark several tested drivers as IFF_KNOWSEPOCH. Reviewed by: hselasky, jeff, bz, gallatin Differential Revision: https://reviews.freebsd.org/D23674 Notes: svn path=/head/; revision=358301
* Stop entering the network epoch in ether_input(), unless driverGleb Smirnoff2020-01-231-2/+4
| | | | | | | is marked with IFF_NEEDSEPOCH. Notes: svn path=/head/; revision=357012
* Widen NET_EPOCH coverage.Gleb Smirnoff2019-10-071-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111 Notes: svn path=/head/; revision=353292
* style(9): remove extraneous empty linesGleb Smirnoff2019-09-251-1/+0
| | | | Notes: svn path=/head/; revision=352725
* Restructure mbuf send tags to provide stronger guarantees.John Baldwin2019-05-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117 Notes: svn path=/head/; revision=348254
* net: adjust randomized address bitsKyle Evans2019-04-171-11/+34
| | | | | | | | | | | | | Give devices that need a MAC a 16-bit allocation out of the FreeBSD Foundation OUI range. Change the name ether_fakeaddr to ether_gen_addr now that we're dealing real MAC addresses with a real OUI rather than random locally-administered addresses. Reviewed by: bz, rgrimes Differential Revision: https://reviews.freebsd.org/D19587 Notes: svn path=/head/; revision=346324
* ether_fakeaddr: Use 'b' 's' 'd' for the prefixKyle Evans2019-03-141-5/+6
| | | | | | | | | | | | | This has the advantage of being obvious to sniff out the designated prefix by eye and it has all the right bits set. Comment stolen from ffec. I've removed bryanv@'s pending question of using the FreeBSD OUI range -- no one has followed up on this with a definitive action, and there's no particular reason to shoot for it and the administrative overhead that comes with deciding exactly how to use it. Notes: svn path=/head/; revision=345151
* ether: centralize fake hwaddr generationKyle Evans2019-03-141-0/+14
| | | | | | | | | | | | We currently have two places with identical fake hwaddr generation -- if_vxlan and if_bridge. Lift it into if_ethersubr for reuse in other interfaces that may also need a fake addr. Reviewed by: bryanv, kp, philip Differential Revision: https://reviews.freebsd.org/D19573 Notes: svn path=/head/; revision=345139
* Update for IETF draft-ietf-6man-ipv6only-flag.Bjoern A. Zeeb2019-03-061-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | All changes are hidden behind the EXPERIMENTAL option and are not compiled in by default. Add ND6_IFF_IPV6_ONLY_MANUAL to be able to set the interface into no-IPv4-mode manually without router advertisement options. This will allow developers to test software for the appropriate behaviour even on dual-stack networks or IPv6-Only networks without the option being set in RA messages. Update ifconfig to allow setting and displaying the flag. Update the checks for the filters to check for either the automatic or the manual flag to be set. Add REVARP to the list of filtered IPv4-related protocols and add an input filter similar to the output filter. Add a check, when receiving the IPv6-Only RA flag to see if the receiving interface has any IPv4 configured. If it does, ignore the IPv6-Only flag. Add a per-VNET global sysctl, which is on by default, to not process the automatic RA IPv6-Only flag. This way an administrator (if this is compiled in) has control over the behaviour in case the node still relies on IPv4. Notes: svn path=/head/; revision=344859
* New pfil(9) KPI together with newborn pfil API and control utility.Gleb Smirnoff2019-01-311-24/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses. In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together. New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters. Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points. Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated. Differential Revision: https://reviews.freebsd.org/D18951 Notes: svn path=/head/; revision=343631