| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There're lots of consumers, Ethernet drivers, libraries and applications.
It is more promising to assert the sizes in every compilation units
rather than in if_ethersubr.c only.
Those assertions were in the header file but were moved to if_ethersubr.c
due to possible conflict of the implementation of CTASSERT [1]. Now that
the default C standard is now gnu17 [2] [3] which supports _Static_assert
natively, use _Static_assert instead of CTASSERT to avoid possible
conflicts.
While here, add an extra assertion for the size of struct ether_vlan_header.
[1] d54d93ac7f0f Move CTASSERT of ether header sizes out of the header file and into ...
[2] ca4eddea97c5 src: Use gnu17 as the default C standard for userland instead of gnu99
[3] 3a98d5701c7f sys: Use gnu17 as the default C standard for the kernel
PR: 287761 (exp-run)
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D50947
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some errors in ether_gen_addr() caused us to generate MAC addresses out
of range, and the ones that were within range had other errors causing
the pool of addresses that we might actually generate to shrink.
Fix both prblems by using only two bytes of the digest and then OR'ing
against the mask, which has the appropriate byte set for the fourth
octet of the range already; essentially, our digest is only contributing
the last two octets.
Change is the author, but any blame for the commit message goes to
kevans.
PR: 256850
Relnotes: yes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ethernet MAC addresses are currently generated by concatenating the
first bytes of a SHA1 digest. However the digest buffer is defined as a
signed char buffer, which means that any digest digit greater than 0x80
will be promoted to a negative int before the concatenation.
As a result, any digest digit greater than 0x80 will overwrite the
previous ones throught the application of the bitwise-or with its 0xFF
higher bytes, effectively reducing the entropy of addresses generated
and significantly increasing the risk of conflict.
Defining the digest buffer as unsigned ensures there will be no unwanted
consequences during integer promotion and the concatenation will work as
expected.
Signed-off-by: Quentin Thébault <quentin.thebault@defenso.fr>
Closes: https://github.com/freebsd/freebsd-src/pull/1750
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the definition of vlan_input_p to net/if.c and its prototype to
if_vlan_var.h, to match the other functions exported from if_vlan.
Remove the previous comment which is now outdated.
This is required for if_bridge to use this function.
Reviewed by: zlei, kp, des, kib
Approved by: des (mentor)
Differential Revision: https://reviews.freebsd.org/D50567
|
|
|
|
|
|
|
|
|
|
|
|
| |
IFCAP_VLAN_MTU
Lots of Ethernet drivers fix the header length after ether_ifattach().
Well since the net stack can conclude it based on the capability
IFCAP_VLAN_MTU, let the net stack do it rather than individual drivers.
Reviewed by: markj, glebius
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50846
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
add a new sysctl, net.link.bridge.member_ifaddrs, which defaults to 1.
if it is set to 1, bridge behaviour is unchanged.
if it is set to 0:
- an interface which has AF_INET6 or AF_INET addresses assigned cannot
be added to a bridge.
- an interface in a bridge cannot have an AF_INET6 or AF_INET address
assigned to it.
- the bridge will no longer consider the lladdrs on bridge members to be
local addresses, i.e. frames sent to member lladdrs will not be
processed by the host.
update bridge.4 to document this behaviour, as well as the existing
recommendation that IP addresses should not be configured on bridge
members anyway, even if it currently partially works.
in testing, setting this to 0 on a bridge with 50 member interfaces
improved throughput by 22% (4.61Gb/s -> 5.67Gb/s) across two member
epairs due to eliding the bridge member list walk in GRAB_OUR_PACKETS.
Reviewed by: kp, des
Approved by: des (mentor)
Differential Revision: https://reviews.freebsd.org/D49995
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
currently, bridge stores a pointer to its softc in ifnet. this means
that when processing a packet, it needs to walk the bridge's member list
to find the bridge_iflist for the interface the packet came from.
instead, store a pointer to the bridge_iflist in ifnet, and add a
pointer from bridge_iflist back to the bridge_softc. this means given
an ifnet, we always have both the softc and the bridge_iflist without a
list walk.
there are two places outside if_bridge that treat ifnet->if_bridge as
something other than an opaque pointer: bridgestp, and netinet. add two
function pointers exported from if_bridge to handle those cases.
bump __FreeBSD_version as this is technically a KABI break.
Reviewed by: kp
|
|
|
|
|
| |
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D48490
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The mbuf flag M_HASFCS was introduced for drivers to indicate the net
stack that packets include FCS (Frame Check Sequence). In principle, to
be efficient, FCS should always be processed by hardware, firmware, or
at last sort the driver. Well, Ethernet specifies that damaged frames
should be discarded, thus only good ones will be passed up to the net
stack, then it makes no senses for the net stack to see FCS just to trim
it.
The last consumer of the flag M_HASFCS has been removed since change [1].
It is time to retire it.
1. 105a4f7b3cb6 ng_atmllc: remove
Reviewed by: kp
MFC after: never
Differential Revision: https://reviews.freebsd.org/D42391
|
|
|
|
|
|
|
|
|
|
| |
Both the mbuf length and the total packet length are signed.
While here, update a stall comment to reflect the current practice.
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42390
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed
to them. (E.g. when rejecting a packet, or when gathering up packets
for reassembly).
If the hook returns PFIL_PASS the mbuf must still be present. Assert
this in pfil_mem_common() and ensure that ipfilter follows this
convention. pf and ipfw already did.
Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf
must have been freed (or now be owned by the firewall for further
processing, like packet scheduling or reassembly).
This allows us to remove a few extraneous NULL checks.
Suggested by: tuexen
Reviewed by: tuexen, zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43617
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some Raspberry Pi pass smsc95xx.macaddr=XX:XX:XX:XX:XX:XX as bootargs.
Use this if no ethernet address is found in an EEPROM.
As last resort fall back to ether_gen_addr() instead of random MAC.
PR: 274092
Reported by: Patrick M. Hausen (via ML)
Reviewed by: imp, karels, zlei
Tested by: Patrick M. Hausen
Approved by: karels
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D42463
|
|
|
|
|
|
|
|
| |
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 868aabb4708d introduced per-flow priority. There's a defect in the
logic for untagged traffic, it does not check M_VLANTAG set in the mbuf
packet header or MTAG_8021Q/MTAG_8021Q_PCP_OUT tag set by firewall, then
can result missing desired priority in the outbound packets.
For mbuf packet with M_VLANTAG in header, some interfaces happen to work
due to bug in the drivers mentioned in D39499. As modern interfaces have
VLAN hardware offloading, the defect is barely noticeable unless the
feature per-flow priority is widely tested.
As a side effect of this defect, the soft padding to work around buggy
bridges is bypassed. That may result in regression if soft padding is
requested.
PR: 273431
Discussed with: kib
Fixes: 868aabb4708d Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39536
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For oubound traffic, the flag M_VLANTAG is set in mbuf packet header to
indicate the underlaying interface do hardware VLAN tag insertion if
capable, otherwise the net stack will do 802.1Q encapsulation instead.
Commit 868aabb4708d introduced per-flow priority which set the priority ID
in the mbuf packet header. There's a corner case that when the driver is
disabled to do hardware VLAN tag insertion, and the net stack do 802.1Q
encapsulation, then it will result double tagged packets if the driver do
not check the enabled capability (hardware VLAN tag insertion).
Unfortunately some drivers, currently known cxgbe(4) re(4) ure(4) igc(4)
and vmx(4), have this issue. From a quick review for other interface
drivers I believe a lot more drivers have the same issue. It makes more
sense to fix in net stack than to try to change every single driver.
PR: 270736
Reviewed by: kp
Fixes: 868aabb4708d Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39499
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit c7cffd65c5d8 the function ether_8021q_frame() was slightly
refactored to use pointer of struct ether_8021q_tag as parameter qtag to
include the new option proto.
It is wrong to write to qtag->pcp as it will effectively change the memory
that qtag points to. Unfortunately the transmit routine of if_vlan parses
pointer of the member ifv_qtag of its softc which stores vlan interface's
PCP internally, when transmitting mbufs that contains PCP the vlan
interface's PCP will get overwritten.
Fix by operating on a local copy of qtag->pcp. Also mark 'struct ether_8021q_tag'
as const so that compilers can pick up such kind of bug.
PR: 273304
Reviewed by: kp
Fixes: c7cffd65c5d85 Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39505
|
|
|
|
| |
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6bf. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a4. However in e87c4940156 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.
This change not only reverts e87c4940156 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch. A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.
Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH. But the tcp_lro_flush_all() asserts the presence
of network epoch. Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.
Reviewed by: zlei, gallatin, erj
Differential Revision: https://reviews.freebsd.org/D39510
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if_bridge receives packets via a special interface, if_bridge_input,
rather than by if_input. Thus, netmap's usual hooking of ifnet routines
does not work as expected. Instead, modify bridge_input() to pass
packets directly to netmap when it is enabled. This applies to both
locally delivered packets and forwarded packets.
When a netmap application transmits a packet by writing it to the host
TX ring, the mbuf chain is passed to if_input, which ordinarily points
to ether_input(). However, when transmitting via if_bridge,
bridge_input() needs to see the packet again in order to decide whether
to deliver or forward. Thus, introduce a new protocol flag,
M_BRIDGE_INJECT, which 1) causes the packet to be passed to
bridge_input() again after Ethernet processing, and 2) avoids passing
the packet back to netmap. The source MAC address of the packet is used
to determine the original "receiving" interface.
Reviewed by: vmaffione
MFC after: 2 months
Sponsored by: Zenarmor
Sponsored by: OPNsense
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38066
|
|
|
|
|
|
|
| |
This finalizes what has been started in 0b70e3e78b0.
Reviewed by: kp, mjg
Differential revision: https://reviews.freebsd.org/D37976
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Hide more netstack by making the BPF_TAP macros real functions in the
netstack. "struct ifnet" is used in the header instead of "if_t" to
keep header pollution down.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38103
|
|
|
|
|
|
|
|
|
|
|
| |
Hide the ifnet structure definition, no user serviceable parts inside,
it's a netstack implementation detail. Include it temporarily in
<net/if_var.h> until all drivers are updated to use the accessors
exclusively.
Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38046
|
|
|
|
|
|
|
|
|
| |
This avoids having to undo it before invoking NetGraph's orphan input
hook.
Reviewed by: ae, melifaro
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D37510
|
| |
|
|
|
|
|
|
| |
The primary reason for this change is to facilitate testing.
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement kernel support for RFC 5549/8950.
* Relax control plane restrictions and allow specifying IPv6 gateways
for IPv4 routes. This behavior is controlled by the
net.route.rib_route_ipv6_nexthop sysctl (on by default).
* Always pass final destination in ro->ro_dst in ip_forward().
* Use ro->ro_dst to exract packet family inside if_output() routines.
Consistently use RO_GET_FAMILY() macro to handle ro=NULL case.
* Pass extracted family to nd6_resolve() to get the LLE with proper encap.
It leverages recent lltable changes committed in c541bd368f86.
Presence of the functionality can be checked using ipv4_rfc5549_support feature(3).
Example usage:
route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0
Differential Revision: https://reviews.freebsd.org/D30398
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we use pre-calculated headers inside LLE entries as prepend data
for `if_output` functions. Using these headers allows saving some
CPU cycles/memory accesses on the fast path.
However, this approach makes adding L2 header for IPv4 traffic with IPv6
nexthops more complex, as it is not possible to store multiple
pre-calculated headers inside lle. Additionally, the solution space is
limited by the fact that PCB caching saves LLEs in addition to the nexthop.
Thus, add support for creating special "child" LLEs for the purpose of holding
custom family encaps and store mbufs pending resolution. To simplify handling
of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE.
Such LLEs are not visible when iterating LLE table. Their lifecycle is bound
to the "parent" LLE - it is not possible to delete "child" when parent is alive.
Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state
machine used by the standard LLEs.
nd6_lookup() and nd6_resolve() now accepts an additional argument, family,
allowing to return such child LLEs. This change uses `LLE_SF()` macro which
packs family and flags in a single int field. This is done to simplify merging
back to stable/. Once this code lands, most of the cases will be converted to
use a dedicated `family` parameter.
Differential Revision: https://reviews.freebsd.org/D31379
MFC after: 2 weeks
|
|
|
|
|
|
| |
This helps ensure that outbound packet data is initialized per KMSAN.
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use newly-create llentry_request_feedback(),
llentry_mark_used() and llentry_get_hittime() to
request datapatch usage check and fetch the results
in the same fashion both in IPv4 and IPv6.
While here, simplify llentry_provide_feedback() wrapper
by eliminating 1 condition check.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31390
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, this will still hash the default (all zero) hostuuid and
potentially arrive at a MAC address that has a high chance of collision
if another interface of the same name appears in the same broadcast
domain on another host without a hostuuid, e.g., some virtual machine
setups.
Instead of using the default hostuuid, just treat it as a failure and
generate a random LA unicast MAC address.
Reviewed by: bz, gbe, imp, kbowling, kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29788
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the bridge is moved to a different vnet we must remove all of its
member interfaces (and span interfaces), because we don't know if those
will be moved along with it. We don't want to hold references to
interfaces not in our vnet.
Reviewed by: donner@
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D28859
|
|
|
|
|
|
|
|
|
| |
And switch from int to bool while at it.
Reviewed by: melifaro@
Differential Revision: https://reviews.freebsd.org/D27725
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the failover protocol is supported due to limitations in the IPoIB
architecture. Refer to the lagg(4) manual page for how to configure
and use this new feature. A new network interface type,
IFT_INFINIBANDLAG, has been added, similar to the existing
IFT_IEEE8023ADLAG .
ifconfig(8) has been updated to accept a new laggtype argument when
creating lagg(4) network interfaces. This new argument is used to
distinguish between ethernet and infiniband type of lagg(4) network
interface. The laggtype argument is optional and defaults to
ethernet. The lagg(4) command line syntax is backwards compatible.
Differential Revision: https://reviews.freebsd.org/D26254
Reviewed by: melifaro@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Notes:
svn path=/head/; revision=366933
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
802.1ad interfaces are created with ifconfig using the "vlanproto" parameter.
Eg., the following creates a 802.1Q VLAN (id #42) over a 802.1ad S-VLAN
(id #5) over a physical Ethernet interface (em0).
ifconfig vlan5 create vlandev em0 vlan 5 vlanproto 802.1ad up
ifconfig vlan42 create vlandev vlan5 vlan 42 inet 10.5.42.1/24
VLAN_MTU, VLAN_HWCSUM and VLAN_TSO capabilities should be properly
supported. VLAN_HWTAGGING is only partially supported, as there is
currently no IFCAP_VLAN_* denoting the possibility to set the VLAN
EtherType to anything else than 0x8100 (802.1ad uses 0x88A8).
Submitted by: Olivier Piras
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D26436
Notes:
svn path=/head/; revision=366917
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt)
option IP(V6)_VLAN_PCP, which can be set to -1 (interface
default), or explicitly to any priority between 0 and 7.
Note that for untagged traffic, explicitly adding a
priority will insert a special 801.1Q vlan header with
vlan ID = 0 to carry the priority setting
Reviewed by: gallatin, rrs
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26409
Notes:
svn path=/head/; revision=366569
|
|
|
|
| |
Notes:
svn path=/head/; revision=365071
|
|
|
|
|
|
|
|
|
|
| |
There's no need for these parts of the kernel to know about newbus,
so narrow what is included to devctl.h for device_notify_*.
Suggested by: kib@
Notes:
svn path=/head/; revision=364442
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if_attach() -> if_attach_internal() will call if_attachdomain1(ifp) any time
an ethernet interface is setup *after*
SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_FIRST. This eventually leads to
nd6_ifattach() -> nd6_setmtu0() stashing off ifp->if_mtu in ndi->maxmtu
*before* ifp->if_mtu has been properly set in some scenarios, e.g., USB
ethernet adapter plugged in later on.
For interfaces that are created in early boot, we don't have this issue as
domains aren't constructed enough for them to attach and thus it gets
deferred to domainifattach at SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_SECOND
*after* the mtu has been set earlier in ether_ifattach().
PR: 248005
Submitted by: Mathew <mjanelle blackberry com>
MFC after: 1 week
Notes:
svn path=/head/; revision=363244
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we create two (vnet) jails and create a bridge interface in each we end up
with the same mac address on both bridge interfaces.
These very often conflicts, resulting in same mac address in both jails.
Mitigate this problem by including the jail name in the mac address.
Reviewed by: kevans, melifaro
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24383
Notes:
svn path=/head/; revision=360068
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Notes:
svn path=/head/; revision=358333
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
switch over to opt-in instead of opt-out for epoch.
Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks
itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch
when processing its packets.
Now this will create recursive entrance in epoch in >90% network
drivers, but will guarantee safeness of the transition.
Mark several tested drivers as IFF_KNOWSEPOCH.
Reviewed by: hselasky, jeff, bz, gallatin
Differential Revision: https://reviews.freebsd.org/D23674
Notes:
svn path=/head/; revision=358301
|
|
|
|
|
|
|
| |
is marked with IFF_NEEDSEPOCH.
Notes:
svn path=/head/; revision=357012
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
Notes:
svn path=/head/; revision=353292
|
|
|
|
| |
Notes:
svn path=/head/; revision=352725
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Perform ifp mismatch checks (to determine if a send tag is allocated
for a different ifp than the one the packet is being output on), in
ip_output() and ip6_output(). This avoids sending packets with send
tags to ifnet drivers that don't support send tags.
Since we are now checking for ifp mismatches before invoking
if_output, we can now try to allocate a new tag before invoking
if_output sending the original packet on the new tag if allocation
succeeds.
To avoid code duplication for the fragment and unfragmented cases,
add ip_output_send() and ip6_output_send() as wrappers around
if_output and nd6_output_ifp, respectively. All of the logic for
setting send tags and dealing with send tag-related errors is done
in these wrapper functions.
For pseudo interfaces that wrap other network interfaces (vlan and
lagg), wrapper send tags are now allocated so that ip*_output see
the wrapper ifp as the ifp in the send tag. The if_transmit
routines rewrite the send tags after performing an ifp mismatch
check. If an ifp mismatch is detected, the transmit routines fail
with EAGAIN.
- To provide clearer life cycle management of send tags, especially
in the presence of vlan and lagg wrapper tags, add a reference count
to send tags managed via m_snd_tag_ref() and m_snd_tag_rele().
Provide a helper function (m_snd_tag_init()) for use by drivers
supporting send tags. m_snd_tag_init() takes care of the if_ref
on the ifp meaning that code alloating send tags via if_snd_tag_alloc
no longer has to manage that manually. Similarly, m_snd_tag_rele
drops the refcount on the ifp after invoking if_snd_tag_free when
the last reference to a send tag is dropped.
This also closes use after free races if there are pending packets in
driver tx rings after the socket is closed (e.g. from tcpdrop).
In order for m_free to work reliably, add a new CSUM_SND_TAG flag in
csum_flags to indicate 'snd_tag' is set (rather than 'rcvif').
Drivers now also check this flag instead of checking snd_tag against
NULL. This avoids false positive matches when a forwarded packet
has a non-NULL rcvif that was treated as a send tag.
- cxgbe was relying on snd_tag_free being called when the inp was
detached so that it could kick the firmware to flush any pending
work on the flow. This is because the driver doesn't require ACK
messages from the firmware for every request, but instead does a
kind of manual interrupt coalescing by only setting a flag to
request a completion on a subset of requests. If all of the
in-flight requests don't have the flag when the tag is detached from
the inp, the flow might never return the credits. The current
snd_tag_free command issues a flush command to force the credits to
return. However, the credit return is what also frees the mbufs,
and since those mbufs now hold references on the tag, this meant
that snd_tag_free would never be called.
To fix, explicitly drop the mbuf's reference on the snd tag when the
mbuf is queued in the firmware work queue. This means that once the
inp's reference on the tag goes away and all in-flight mbufs have
been queued to the firmware, tag's refcount will drop to zero and
snd_tag_free will kick in and send the flush request. Note that we
need to avoid doing this in the middle of ethofld_tx(), so the
driver grabs a temporary reference on the tag around that loop to
defer the free to the end of the function in case it sends the last
mbuf to the queue after the inp has dropped its reference on the
tag.
- mlx5 preallocates send tags and was using the ifp pointer even when
the send tag wasn't in use. Explicitly use the ifp from other data
structures instead.
- Sprinkle some assertions in various places to assert that received
packets don't have a send tag, and that other places that overwrite
rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer.
Reviewed by: gallatin, hselasky, rgrimes, ae
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20117
Notes:
svn path=/head/; revision=348254
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Give devices that need a MAC a 16-bit allocation out of the FreeBSD
Foundation OUI range. Change the name ether_fakeaddr to ether_gen_addr now
that we're dealing real MAC addresses with a real OUI rather than random
locally-administered addresses.
Reviewed by: bz, rgrimes
Differential Revision: https://reviews.freebsd.org/D19587
Notes:
svn path=/head/; revision=346324
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has the advantage of being obvious to sniff out the designated prefix
by eye and it has all the right bits set. Comment stolen from ffec.
I've removed bryanv@'s pending question of using the FreeBSD OUI range --
no one has followed up on this with a definitive action, and there's no
particular reason to shoot for it and the administrative overhead that comes
with deciding exactly how to use it.
Notes:
svn path=/head/; revision=345151
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently have two places with identical fake hwaddr generation --
if_vxlan and if_bridge. Lift it into if_ethersubr for reuse in other
interfaces that may also need a fake addr.
Reviewed by: bryanv, kp, philip
Differential Revision: https://reviews.freebsd.org/D19573
Notes:
svn path=/head/; revision=345139
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All changes are hidden behind the EXPERIMENTAL option and are not compiled
in by default.
Add ND6_IFF_IPV6_ONLY_MANUAL to be able to set the interface into no-IPv4-mode
manually without router advertisement options. This will allow developers to
test software for the appropriate behaviour even on dual-stack networks or
IPv6-Only networks without the option being set in RA messages.
Update ifconfig to allow setting and displaying the flag.
Update the checks for the filters to check for either the automatic or the manual
flag to be set. Add REVARP to the list of filtered IPv4-related protocols and add
an input filter similar to the output filter.
Add a check, when receiving the IPv6-Only RA flag to see if the receiving
interface has any IPv4 configured. If it does, ignore the IPv6-Only flag.
Add a per-VNET global sysctl, which is on by default, to not process the automatic
RA IPv6-Only flag. This way an administrator (if this is compiled in) has control
over the behaviour in case the node still relies on IPv4.
Notes:
svn path=/head/; revision=344859
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented. The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.
In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.
New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.
Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.
Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.
Differential Revision: https://reviews.freebsd.org/D18951
Notes:
svn path=/head/; revision=343631
|