| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On platforms other than amd64, BIOCSRTIMEOUT is equal to
BIOCSRTIMEOUT32. Therefore, running the COMPAT_FREEBSD32 code
basically clears tv_usec on big endian platforms. When tcpdump is
used, the timeout requested is 100ms, which gets cleared to 0 on
ppc64 platforms. This results in tcpdump showing the packets only
when the read buffer is full.
Thanks to kib for guiding me to the correct fix.
Reported by: ivy
Reviewed by: adrian, kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D56399
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Make loopback packet drops more obvious by reporting them
in interface stats visable via netstat -ni
Since loopback uses netisr, packets can be dropped if the
netisr queue overflows. These drops are visible via
netisr -Q, but its not an obvious place to look.
Differential Revision: https://reviews.freebsd.org/D56356
Reviewed by: glebius, tuexen
Sponsored by: Netflix
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for CMIS based optics, typically used by 400GbE
and faster ethernet optics. The CMIS standard requires paged
support for i2c ioctls.
This has been tested on an Nvidia ConnectX-7 and Broadcom
Thor2 400GbE NIC, and I have verified that optics vendor information,
light levels, and temperatures match the information provided by
various vendor tools.
Differential Revision: https://reviews.freebsd.org/D56265
Reviewed by: kbowling, sumit.saxena_broadcom.com
Sponsored by: Netflix
|
| | |
|
| |
|
|
|
|
|
|
|
| |
FIB_NH_LOG calls the `nhop_get_upper_family(nh)` to read
`nh->nh_priv->nh_upper_family` for failure logging.
Call FIB_NH_LOG before freeing nh so failures are logged
without causing a panic.
MFC after: 3 days
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When IFDI_ATTACH_POST() fails (or netmap attach fails), iflib tears down with
ether_ifdetach(), taskqueue_free(ifc_tq), and IFDI_DETACH(). CTX_LOCK is still
held after ether_ifattach. ether_ifdetach() and taskqueue_drain(admin) must not
run under CTX_LOCK.
Teardown ordering (match iflib_device_deregister):
- Free the per-interface admin taskqueue after IFDI_DETACH / IFDI_QUEUES_FREE, not before.
- Drop IFNET_WLOCK() across IFDI_DETACH / IFDI_QUEUES_FREE so driver detach can sleep in
LinuxKPI workqueue drain, then retake IFNET_WLOCK() before iflib_free_intr_mem and fail_unlock.
MFC after: 2 weeks
Reviewed by: gallatin, kgalazka, #iflib
Differential Revision: https://reviews.freebsd.org/D56316
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Observed below kernel panic calltrace while performing sysctl -a
operation while unloading the if_bnxt driver,
Fatal trap 9: general protection fault while in kernel mode
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe02a7569940
vpanic() at vpanic+0x136/frame 0xfffffe02a7569a70
panic() at panic+0x43/frame 0xfffffe02a7569ad0
trap_fatal() at trap_fatal+0x68/frame 0xfffffe02a7569af0
calltrap() at calltrap+0x8/frame 0xfffffe02a7569af0
trap 0x9, rip = 0xffffffff80c0b411, rsp = 0xfffffe02a7569bc0, rbp = 0xfffffe02a7569be0 ---
sysctl_handle_counter_u64() at sysctl_handle_counter_u64+0x61/frame 0xfffffe02a7569be0
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x9c/frame 0xfffffe02a7569c30
sysctl_root() at sysctl_root+0x22f/frame 0xfffffe02a7569cb0
userland_sysctl() at userland_sysctl+0x196/frame 0xfffffe02a7569d50
sys___sysctl() at sys___sysctl+0x65/frame 0xfffffe02a7569e00
amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe02a7569f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe02a7569f30
Root Cause:
iflib adds per-device sysctl nodes under the device tree using the device
sysctl context. Some of those nodes are counter sysctl that point at fields
inside txq→ift_br. When the if_bnxt driver is unloaded, iflib_device_deregister
runs and calls iflib_tx_structures_free, which frees the txqs ift_br. The device
sysctl tree is only freed when the device is destroyed. If sysctl -a runs during
unload, it can still traverse the device tree and call sysctl_handle_counter_u64
for those nodes. The handler does counter_u64_fetch(*(counter_u64_t *)arg1).
By then arg1 can point into freed memory and leads to use after free type kernel panic.
Fix:
flib now uses its own sysctl context for all iflib-related nodes
instead of using device’s context. And iflib sysctl context is now
removed before any queue/ring memory is freed.
MFC after: 2 weeks
Reviewed by: gallatin, ssaxena, #iflib
Differential Revision: https://reviews.freebsd.org/D55981
|
| |
|
|
|
|
|
|
|
|
|
|
| |
geneve creates a generic network virtualization tunnel interface
for Tentant Systems over an L3 (IP/UDP) underlay network that provides
a Layer 2 (ethernet) or Layer 3 service using the geneve protocol.
This implementation is based on RFC8926.
Reviewed by: glebius, adrian
Discussed with: zlei, kp
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D54172
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The change e133271fc1b5e introduced ifnet_detach_sxlock, and change
6d2a10d96fb5 widened its coverage, but there are still consumers,
net80211 and tuntap e.g., want it. Instead of sprinkling it everywhere,
make it opaque to consumers.
Out of tree drivers shall also benefit from this change.
Reviewed by: kp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D56298
|
| |
|
|
|
|
|
|
|
|
|
|
| |
SIOCSIFVNET is not a hardware ioctl. Move it to where it belongs.
Where here, rewrite the logic of checking whether we are moving the
interface from and to the same vnet or not, since it is obviously not
stable to access the interface's vnet, given the current thread may
race with other threads those running if_vmove().
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D55880
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ifnet_detach_sxlock
vnet_if_return() will be invocked by vnet_sysuninit() on vnet destructing,
while the lock ifnet_detach_sxlock has been acquired in vnet_destroy()
already.
With this change the order of locking is more clear. There should be no
functional change.
Reviewed by: pouria
Fixes: 868bf82153e8 if: avoid interface destroy race
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D56288
|
| |
|
|
|
| |
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D56113
|
| |
|
|
|
|
|
|
|
| |
Functional change is that on destruction INVARIANTS checks will run. Also
the mask is no longer hardcoded, so makes it easier to make hash size a
tunable.
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D56176
|
| |
|
|
|
|
|
|
|
|
|
| |
The draft-ietf-6man-ipv6only-flag has been obsoleted by RFC 8925.
Remove the EXPERIMENTAL compile option from the kernel and remove
DRAFT_IETF_6MAN_IPV6ONLY_FLAG from userland.
This compile option was not enabled by default.
Also regenerate src.conf.5.
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D56228
|
| |
|
|
|
|
|
|
|
|
|
| |
To be more robust since the checking is now performed where the
interface is referenced.
While here, remove a redundant check from if_vmove_loan().
Reviewed by: kp, glebius, pouria
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D55875
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds page & bank fields to ifi2creq in preparation
for adding CMIS support for 400g optics to ifconfig.
The new ioctl SIOCGI2CPB is added, so that drivers can distinguish
between callers asking for page/bank selection and legacy callers
that simply failed to zero out all ifi2creq fields.
The mlx5en(4) driver and iflib(4) driver frameork have been updated
to use this new SIOCGI2CPB ioctl and support page/bank operations.
A follow-on patchset will add support to ifconfig for reporting
data from CMIS optics.
This has been tested on Nvidia ConnectX-7 and Broadcom Thor2 (using
out of tree driver) based NICs.
Differential Revision: https://reviews.freebsd.org/D55912
Sponsored by: Netflix Inc.
Reviewed by: kib
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The multicast routing code will start implementing per-FIB routing
tables. As a part of this, it needs to be notified when the number of
FIBs changes, so that it can expand its tables.
Add an eventhandler for this purpose.
MFC after: 2 weeks
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D55239
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The thread running if_vmove_reclaim() may race with other threads those
running if_detach(), if_vmove_loan() or if_vmove_reclaim(). In case the
current thread loses race, two issues arise,
1. It is unstable and unsafe to access ifp->if_vnet,
2. The interface is removed from "active" list, hence if_unlink_ifnet()
can fail.
For the first case, check against source prison's vnet instead, given
the interface is obtained from that vnet.
For the second one, return ENODEV to indicate the interface was on the
list but the current thread loses race, to distinguish from ENXIO, which
means the interface or child prison is not found. This is the same with
if_vmove_loan().
Reviewed by: kp, pouria
Fixes: a779388f8bb3 if: Protect V_ifnet in vnet_if_return()
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D55997
|
| |
|
|
|
|
|
|
| |
Fix incorrect removal of opt_route.h header in route_ctl.c
Reported by: Jenkins
Fixes: 254b23eb1f54 ("routing: Retire ROUTE_MPATH compile option")
Differential Revision: https://reviews.freebsd.org/D55884
|
| |
|
|
|
|
|
|
|
|
|
| |
The ROUTE_MPATH compile option was introduced to
test the new multipath implementation.
Since compiling it has no overhead and it's enabled
by default, remove it.
Reviewed by: melifaro, markj
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D55884
|
| |
|
|
|
|
|
| |
- s/Circiut/Circuit/
Obtained from: OpenBSD
MFC after: 3 days
|
| |
|
|
|
|
|
| |
bridge doesn't require to enter epoch during destruction.
Reviewed by: zlei, glebius
Differential Revision: https://reviews.freebsd.org/D55935
|
| |
|
|
|
|
|
|
|
|
|
|
| |
bridge tries to run callout_drain(9) twice under epoch
during destruction.
once for bridge_timer, which is not required to be under epoch.
second time for the BSTP callout, which is already disabled
earlier inside bridge_delete_member.
Reviewed by: glebius, zlei
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D55876
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ioctls SIOCSIFVNET and SIOCSIFRVNET are for userland only. For
SIOCSIFVNET, if_vmove_loan(), the interface is obtained from current
VNET. For SIOCSIFRVNET, if_vmove_reclaim(), a valid child prison is
held before getting the interface. In both cases the VNET of the
obtained interfaces is stable, so there's no need to check it.
No functional change intended.
Reviewed by: glebius, jamie (for #jails)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D55828
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
It should be decreased only when the interface has been successfully
removed from the "active" list.
This prevents vnet_if_return() from potential OOB writes to the
allocated memory "pending".
Reviewed by: kp, pouria
Fixes: a779388f8bb3 if: Protect V_ifnet in vnet_if_return()
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D55873
|
| |
|
|
|
|
|
|
|
| |
Added optional system tunable parameter to enable
4-tuple rss udp hashing.
Signed-off-by: bigJ <bigj@solanavibestation.com>
Reviewed by: adrian, pouria
Pull Request: https://github.com/freebsd/freebsd-src/pull/2057
|
| |
|
|
|
|
|
|
|
| |
Changes: https://raw.githubusercontent.com/the-tcpdump-group/libpcap/89e982c37c36ad0bf9f10b7ded421cb42422effa/CHANGES
Reviewed by: bms, emaste
Obtained from: https://www.tcpdump.org/release/libpcap-1.10.6.tar.gz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55545
Differential Revision: https://reviews.freebsd.org/D55858
|
| |
|
|
| |
The module constructs UDP packets, but doesn't use the UDP stack.
|
| | |
|
| |
|
|
|
|
|
|
|
| |
All supported stable branches use netlink(4) API to configure carp(4).
The deleted code also has kernel stack leak vulnerability, that requires
extra effort to fix.
Reviewed by: pouria, kp
Differential Revision: https://reviews.freebsd.org/D55804
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some 32-bit architectures, e.g., armv7, require strict 8-byte
alignment while doing atomic 64-bit access. Hence aligning to the
pointer type (4-byte alignment) does not meet the requirement on
those architectures.
Make the space allocated by vnet_data_alloc() sufficent aligned to
avoid unaligned access.
PR: 265639
Diagnosed by: markj
Reviewed by: jhb, markj
Co-authored-by: jhb
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D55560
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
- Move some of the braces under their respective conditionals to make the
statements more self-encapsulated and only define the `aliasreq` union
in the event either INET or INET6 is defined.
- Fix a copy-paste error: `in_gre_ioctl` should be `in6_gre_ioctl` in the
INET6 case.
Reported by: tinderbox
Fixes: e1e18cc12e68 ("if_gre: Add netlink support with tests")
Differential Revision: https://reviews.freebsd.org/D55546
|
| |
|
|
|
|
|
| |
Approved by: so
Security: FreeBSD-SA-26:05.route
Security: CVE-2026-3038
Fixes: 92be2847e845 ("rtsock: Avoid copying uninitialized padding bytes")
|
| |
|
|
|
|
|
|
|
| |
Reported by: pho
Reviewed by: markj
Fixes: d4062b9f16e46f039f2b5b40dd35592b5dabf00c
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D55447
|
| |
|
|
|
|
|
|
| |
Migrate to new if_clone KPI and implement netlink support
for gre(4). Also refactor some of the gre specific ioctls.
Reviewed by: glebius, zlei
Differential Revision: https://reviews.freebsd.org/D54443
|
| |
|
|
|
|
|
| |
The bpf_attachd() will perform bpf_detachd() itself. Performing it twice
will lead to doing CK_LIST_REMOVE twice.
Reported & tested by: bz
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The init routine of a lagg(4) interface will not change during the whole
lifecycle. So we can call lagg_init() directly instead of through the
function pointer. Well, that requires a drop and pickup lock, which
unnecessarily expose a small race window. Refactor lagg_init() into
lagg_init_locked() and call the later one to avoid that.
Meanwhile, delay updating the driver managed status until after the
interface is really ready.
Reviewed by: markj
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D55198
|
| |
|
|
| |
Sponsored by: Rubicon Communications, LLC ("Netgate")
|
| |
|
|
|
|
|
|
|
|
|
| |
This used to be needed when interface renames were broadcast using the
ifnet_departure_event eventhandler, but since commit 349fcf079ca3
("net: add ifnet_rename_event EVENTHANDLER(9) for interface renaming"),
it has no purpose. Remove it.
Reviewed by: pouria, zlei
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D55171
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds native support for the SIOCGIFDOWNREASON ioctl in iflib.
When ifconfig issues SIOCGIFDOWNREASON, the request is now routed through a
new driver callback (IFDI_GET_DOWNREASON). iflib allocates the ifdownreason
structure, calls the driver to fill the down-reason message, and then
returns the data back to ifconfig for display.
Without this change, iflib-based drivers cannot implement link-down reason
reporting even if the hardware provides the information.
No functional change for existing drivers unless they implement the new
IFDI_GET_DOWNREASON method. Existing drivers continue to behave as before.
Reviewed by: gallatin, erj, kgalazka, ssaxena, #iflib
Differential Revision: https://reviews.freebsd.org/D54045
MFC After: 1 week
|
| |
|
|
|
|
|
|
|
|
|
|
| |
It is declared as static. Make the definition consistent with the
declaration.
It was ever fixed by commit 52e53e2de0ec, but the commit was reverted,
leaving it unfixed.
No functional change intended.
MFC after: 3 days
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is set but never used. Remove it to avoid confusion and save a
little space.
While here, use designated initializers to initialize the LAGG protocol
table. That improves readability, and it will be safer to initialize the
table if we introduce new protocols in the future.
No functional change intended.
Reviewed by: glebius
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D55124
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All the other protocols have corresponding start and input routines,
which are used in the fast path. Currently the none protocol is
treated specially. In the fast path it is checked to indicate whether
a working protocol is configured. There are two issues raised by this
design:
1. In production, other protocols are commonly used, but not the
none protocol. It smells like an overkill to always check it in the
fast path. It is unfair to other commonly used protocols.
2. PR 289017 reveals that there's a small window between checking the
protocol and calling lagg_proto_start(). lagg_proto_start() is possible
to see the none protocol and do NULL deferencing.
Fix them by making the none protocol a first-class citizen so that it
has start and input routines just the same as other protocols. Then we
can stop checking it in the fast path, since lagg_proto_start() and
lagg_proto_input() will never fail to work.
The error ENETDOWN is chosen for the start routine. Obviously no active
ports are available, and the packets will go nowhere. It is also a
better error than ENXIO, since indeed the interface is configured and
has a TX algorithm (the none protocol).
PR: 289017
Diagnosed by: Qiu-ji Chen <chenqiuji666@gmail.com>
Tested by: Gui-Dong Han <hanguidong02@gmail.com>
Reviewed by: glebius
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D55123
|
| |
|
|
|
|
|
|
|
|
| |
During packet processing the descriptor is looked up using epoch(9) and it
can be accessed after bpf_detachd(). In scenario of descriptor close the
tap point is alive (it actually produces packets) and thus the pointer can
be legitimately dereferenced. This fixes a race on a bpf(4) device close
that would otherwise result in panic.
Differential Revision: https://reviews.freebsd.org/D55064
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In pf_match_rule() we attempt to append matching rules to the end of
'match_rules'. We want to preserve the order to make the multiple
pflog entries easier to understand. So we keep track of the last added
rule item in 'rt'. However, that assumed that 'match_rules' was only
ever added to in that one call to pf_match_rules(). This isn't always
the case, for example if we have match rules in different anchors.
In that case we'd end up using the uninitialised 'rt' variable in the
SLIST_INSERT_AFTER call.
Instead track the match rules and the last matching rule (to enable
easy appending) in the struct pf_test_ctx.
This also allows us to reduce the number of arguments for some
functions, because we passed a ctx to most functions that needed
'match_rules'.
While here also make pf_match_rules() static, because it's only ever
used in pf.c
Add a test case to exercise the relevant code path.
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add capability VLAN_HWTAGGING to the epair interface and enable it by
default.
When sending a packet over a VLAN interface that uses an epair
interface, the flag M_VLANTAG and the ether_vtag (which contains the
VLAN ID and/or PCP) are set in the mbuf to inform the hardware that
the VLAN header has to be added. The sending epair end does not need
to actually add a VLAN header. It can just pass the mbuf with this
setting to the other epair end, which receives the packet. The
receiving epair end can just pass the mbuf with this setting to the
upper layer. Due to this setting, the upper layer believes that there
was a VLAN header that has been removed by the interface.
If the packet later leaves the host, the outgoing physical interface
can add the VLAN header in hardware if it supports VLAN_HWTAGGING.
If not, the implementation of Ethernet or bridge adds the VLAN header
in software.
Reviewed by: zlei, tuexen
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D52465
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add sched_find_l2_neighbor(). This really should be not
scheduler-depended, in does not have anything to do with scheduler at
all. But for now keep the same code structure.
Reviewed by: olce
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D54831
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stop using struct nd_ifinfo for that, because it is an API struct for
SIOCGIFINFO_IN6. The functional changes are isolated to the protocol
attach and detach: in6_ifarrival(), nd6_ifattach(), in6_ifdeparture(),
nd6_ifdetach(), as well as to the nd6_ioctl(), nd6_ra_input(),
nd6_slowtimo() and in6_ifmtu().
The dad_failures member was just renamed to match the rest. The M_IP6NDP
malloc(9) type declaration moved to files that actually use it.
The rest of the changes are mechanical substitution of double pointer
dereference via ND_IFINFO() to a single pointer dereference. This was
achieved with a sed(1) script:
s/ND_IFINFO\(([a-z0-9>_.-]+)\)->(flags|linkmtu|basereachable|reachable|retrans|chlim)/\1->if_inet6->nd_\2/g
s/nd_chlim/nd_curhoplimit/g
Reviewed by: tuexen, madpilot
Differential Revision: https://reviews.freebsd.org/D54725
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When adding the IFLIB_GET_MBUF/FLAGS, I neglected to NULL out the
mbuf in the descriptor ring. I didn't think this should matter as
the I thought this code was only used when the ring was about
to be freed. But I was wrong, and leaving a stale mbuf in there can
cause panics.
Reported by: Marek Zarychta (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=292547)
Fixes: 14d93f612f26
Sponsored by: Netflix
|