| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
| |
Change 4787572d0580 made if_alloc_domain() never fail, then also do the
wrappers if_alloc(), if_alloc_dev(), and if_gethandle().
No functional change intended.
Reviewed by: kp, imp, glebius, stevek
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D45740
|
| |
|
|
|
|
|
| |
My failure to run all kinds of kernel builds lead to missing the keysock
and incorrectly assuming SDP as not having a shutdown method.
Fixes: 5bba2728079ed4da33f727dbc2b6ae1de02ba897
|
| |
|
|
| |
Submitted by: zlei
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Just like it was done for accept(2) in cfb1e92912b4, use same approach
for two simplier syscalls that return socket addresses. Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.
Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D42694
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let the accept functions provide stack memory for protocols to fill it in.
Generic code should provide sockaddr_storage, specialized code may provide
smaller structure.
While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting
required length in case if provided length was insufficient. Our manual
page accept(2) and POSIX don't explicitly require that, but one can read
the text as they do. Linux also does that. Update tests accordingly.
Reviewed by: rscheff, tuexen, zlei, dchagin
Differential Revision: https://reviews.freebsd.org/D42635
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
|
| |
|
|
| |
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
| |
|
|
| |
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6bf. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a4. However in e87c4940156 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.
This change not only reverts e87c4940156 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch. A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.
Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH. But the tcp_lro_flush_all() asserts the presence
of network epoch. Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.
Reviewed by: zlei, gallatin, erj
Differential Revision: https://reviews.freebsd.org/D39510
|
| |
|
|
|
|
|
|
|
| |
This is counterpart to e87c4940156c, which did the same for ethernet.
Suggested by: hselasky
Reviewed by: hselasky, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D39405
|
| |
|
|
|
|
|
|
|
|
| |
Summary:
Because of the intricacies of this code it wasn't purely scripted, but
instead hand-mechanical.
Reviewed by: hselasky
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38560
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary:
All callers of infiniband_bpf_mtap() call it through the wrapper macro,
which checks the if_bpf member explicitly. Since this is getting
hidden, move this check into the internal function and remove the
wrapper macro.
Reviewed by: hselasky
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39024
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
o Assert that every protosw has pr_attach. Now this structure is
only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw. This was suggested
in 1996 by wollman@ (see 7b187005d18ef), and later reiterated
in 2006 by rwatson@ (see 6fbb9cf860dcd).
o Make struct domain hold a variable sized array of protosw pointers.
For most protocols these pointers are initialized statically.
Those domains that may have loadable protocols have spacers. IPv4
and IPv6 have 8 spacers each (andre@ dff3237ee54ea).
o For inetsw and inet6sw leave a comment noting that many protosw
entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36232
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pr_ctlinput method was a feature of IPv4/IPv6 with exception of
pfctlinput(), which broadcasted a call to pr_ctlinput on all protocols
ever registered statically or with pf_proto_register(). Now that
this broadcast call is gone, the only protocols that get their
pr_ctlinput ever called are those that have registered itselves with
ipproto_register() or ip6proto_register().
It is entirely possible that code deleted now was dead code from very
beginning. Just a copy-paste from TCP.
Reviewed by: rstone
Differential revision: https://reviews.freebsd.org/D36208
|
| |
|
|
|
|
|
|
| |
For some reason protosw.h is used during world complation and userland
is not aware of caddr_t, a relic from the first version of C. Broken
buildworld is good reason to get rid of yet another caddr_t in kernel.
Fixes: 886fc1e80490fb03e72e306774766cbb2c733ac6
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
o Retire SS_FDREF as it is basically a debug flag on top of already
existing soref()/sorele().
o Convert SS_PROTOREF into soref()/sorele().
o Change reference model for the listen queues, see below.
o Make sofree() private. The correct KPI to use is only sorele().
o Make soabort() respect the model and sorele() instead of sofree().
Note on listening queues. Until now the sockets on a queue had zero
reference count. And the reference were given only upon accept(2). The
assumption was that there is no way to see the queued socket from anywhere
except its head. This is not true, since queued sockets already have pcbs,
which are linked at least into the global pcb lists. With this change we
put the reference right in the sonewconn() and on accept(2) path we just
hand the existing reference to the file descriptor.
Differential revision: https://reviews.freebsd.org/D35679
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since c67f3b8b78e the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it. In 74a68313b50 macros that access
this mutex directly were added. Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.
This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.
This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally. However, it allows operation of a
protocol that doesn't use them.
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35152
|
| |
|
|
|
|
| |
- s/for for/for/
MFC after: 3 days
|
| |
|
|
|
|
|
| |
There left only three modules that used dom_init(). And netipsec
was the last one to use dom_destroy().
Differential revision: https://reviews.freebsd.org/D33540
|
| |
|
|
|
|
|
| |
No functional change.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D33639
|
| |
|
|
|
| |
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a
socket rather than a socket buffer. Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.
When locking for I/O, return an error if the socket is a listening
socket. Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.
Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before. In
particular, add checks to sendfile() and sorflush().
Reviewed by: tuexen, gallatin
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31657
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Overview:
This is the first stage of a RDMA stack upgrade introducing kernel
changes only based on Linux 5.7-rc1.
This patch is based on about four main areas of work:
- Update of the IB uobjects system:
- The memory holding so-called AH, CQ, PD, SRQ and UCONTEXT objects
is now managed by ibcore. This also require some changes in the
kernel verbs API. The updated verbs changes are typically about
initialize and deinitialize objects, and remove allocation and
free of memory.
- Update of the uverbs IOCTL framework:
- The parsing and handling of user-space commands has been
completely refactored to integrate with the updated IB uobjects
system.
- Various changes and updates to the generic uverbs interfaces in
device drivers including the new uAPI surface.
- The mlx5_ib_devx.c in mlx5ib and related mlx5 core changes.
Dependencies:
- The mlx4ib driver code has been updated with the minimum changes
needed.
- The mlx5ib driver code has been updated with the minimum changes
needed including DV support.
Compatibility:
- All user-space facing APIs are backwards compatible after this
change.
- All kernel-space facing RDMA APIs are backwards compatible after
this change, with exception of ib_create_ah() and ib_destroy_ah()
which takes a new flag.
- The "ib_device_ops" structure exist, but only contains the driver ID
and some structure sizes.
Differences from Linux:
- Infiniband drivers must use the INIT_IB_DEVICE_OPS() macro to set
the sizes needed for allocating various IB objects, when adding
IB device instances.
Security:
- PRIV_NET_RAW is needed to use raw ethernet transmit features.
- PRIV_DRIVER is needed to use other privileged operations.
Based on upstream Linux, Torvalds (5.7-rc1):
8632e9b5645bbc2331d21d892b0d6961c1a08429
MFC after: 1 week
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31149
Sponsored by: NVIDIA Networking
|
| |
|
|
|
|
|
|
| |
Fixes the IPOIB_CM and SDP kernel options.
Reported by: lwhsu @
MFC after: 1 week
Sponsored by: NVIDIA Networking
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
attach and detach.
Call infiniband_ifdetach() early to stop ifioctl(9) calls from user-space
during device removal. Also make sure that ifioctl(9) calls are blocked from
executing until the device is fully initialized. Ideally we would delay the
infiniband_ifattach() call, but because part of the initialization is to update
the link level address, that is not possible without more significant changes.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.
Linux commit:
f696bf6d64b195b83ca1bdb7cd33c999c9dcf514
7bb1fafc2f163ad03a2007295bb2f57cfdbfb630
d34ac5cd3a73aacd11009c4fc3ba15d7ea62c411
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
|
| |
|
|
|
|
|
|
|
| |
Remove not needed error handling when destroying a CQ. The function in
question will later on be updated to return "void".
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given all the code does operate on struct ifnet, the last step in this
longer series of changes now is to rename struct net_device to
struct ifnet (that is what it was defined to in the LinuxKPi code).
While mlx4 and OFED are "shared" code the decision was made years ago
to not write it based on the netdevice KPI but the native ifnet KPI
for most of it. This commit simply spells this out and with that
frees "struct netdevice" to be re-done on LinuxKPI to become a more
native/mixed implementation over time as needed by, e.g., wireless
drivers.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30515
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The LinuxKPI net_device actually is an ifnet; in order to further
clean that up so we can extend "net_device" migrate the few macros
left into ofed and make sure the header is included in all files
which need access to the macros.
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30477
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Several protocol methods take a sockaddr as input. In some cases the
sockaddr lengths were not being validated, or were validated after some
out-of-bounds accesses could occur. Add requisite checking to various
protocol entry points, and convert some existing checks to assertions
where appropriate.
Reported by: syzkaller+KASAN
Reviewed by: tuexen, melifaro
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29519
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The computed IPOIB_CM_RX_SG is too small. It doesn't account for fallback
to mbuf clusters when jumbo frames are not available and it also doesn't
account for the packet header and trailer mbuf.
This causes a memory overwrite situation when IPOIB_CM is configured.
While at it add a kernel assert to ensure the mapping array is not overwritten.
PR: 254474
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TCP_LINGERTIME can be traced back to BSD 4.4 Lite and perhaps beyond, in
exactly the same form that it appears here modulo slightly different
context. It used to be the case that there was a single pr_usrreq
method with requests dispatched to it; these exact two lines appeared in
tcp_usrreq's PRU_ATTACH handling.
The only purpose of this that I can find is to cause surprising behavior
on accepted connections. Newly-created sockets will never hit these
paths as one cannot set SO_LINGER prior to socket(2). If SO_LINGER is
set on a listening socket and inherited, one would expect the timeout to
be inherited rather than changed arbitrarily like this -- noting that
SO_LINGER is nonsense on a listening socket beyond inheritance, since
they cannot be 'connected' by definition.
Neither Illumos nor Linux reset the timer like this based on testing and
inspection of Illumos, and testing of Linux.
Reviewed by: rscheff, tuexen
Differential Revision: https://reviews.freebsd.org/D28265
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
into net/if_infiniband.c and net/infiniband.h . No functional change
intended.
Differential Revision: https://reviews.freebsd.org/D26254
Reviewed by: melifaro@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Notes:
svn path=/head/; revision=366930
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Call M_SETFIB() to make sure the IPoIB packet is directed to the correct
interface-specific FIB.
This was sufficient to allow general-purpose routing using the default FIB,
and a separate FIB for routing between IPoIB on ib0 and IPoEthernet on mce0.
Reviewed by: hselasky
Obtained from: Anmol Kumar <anmolk at panasas dot com>
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D25239
Notes:
svn path=/head/; revision=366686
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the linking order of the infiniband, IB, modules decide in which
order the clients are attached and detached. For example one IB client may
use resources from another IB client. This can lead to a potential deadlock
at shutdown. For example if the ipoib is unregistered after the ib_multicast
client is detached, then if ipoib is using multicast addresses a deadlock may
happen, because ib_multicast will wait for all its resources to be freed before
returning from the remove method.
Fix this by using module_xxx_order() instead of module_xxx().
Differential Revision: https://reviews.freebsd.org/D23973
MFC after: 1 week
Sponsored by: Mellanox Technologies
Notes:
svn path=/head/; revision=362953
|
| |
|
|
|
|
|
|
|
|
| |
The ipoib_unicast_send() function is not supposed to unlock the priv lock.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Notes:
svn path=/head/; revision=359014
|
| |
|
|
|
|
|
|
| |
MFC after: 1 week
Sponsored by: Mellanox Technologies
Notes:
svn path=/head/; revision=358694
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Notes:
svn path=/head/; revision=358333
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Else the following panic may happen:
panic()
icmp_error()
ipoib_cm_mb_reap()
linux_work_fn()
taskqueue_run_locked()
taskqueue_thread_loop()
fork_exit()
fork_trampoline()
Submitted by: Andreas Kempe <kempe@lysator.liu.se>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Notes:
svn path=/head/; revision=356633
|
| |
|
|
|
|
|
| |
Sponsored by: Mellanox Technologies
Notes:
svn path=/head/; revision=353631
|
| |
|
|
|
|
|
| |
Sponsored by: Mellanox Technologies
Notes:
svn path=/head/; revision=353546
|
| |
|
|
|
|
|
| |
Reviewed by: hselasky
Notes:
svn path=/head/; revision=353504
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
Notes:
svn path=/head/; revision=353292
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the software send queue gets filled up, callbacks to
if_transmit will stop. Make sure the transmit callback
routine checks the send queue and outputs any remaining
mbufs. Else the remaining mbufs may simply sit in the
output queue blocking the transmit path.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Notes:
svn path=/head/; revision=352955
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This regression was introduced in the r326169 Linux v4.9 Infiniband upgrade.
Restore the functionality.
Reviewed by: hselasky
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21298
Notes:
svn path=/head/; revision=351176
|
| |
|
|
|
|
|
|
|
| |
Lost in translation between different SDP stacks.
Reported by: hselasky
Notes:
svn path=/head/; revision=351169
|
| |
|
|
|
|
|
| |
Sponsored by: Dell EMC Isilon
Notes:
svn path=/head/; revision=351162
|
| |
|
|
|
|
|
| |
Sponsored by: Dell EMC Isilon
Notes:
svn path=/head/; revision=351161
|
| |
|
|
|
|
|
| |
Sponsored by: Dell EMC Isilon
Notes:
svn path=/head/; revision=351160
|
| |
|
|
|
|
|
|
|
|
|
| |
This allows use of the shared _net_inet_sdp in more than one compilation
unit. (Nothing in-tree uses this today, but some of Isilon's out-of-tree
SDP enhancements add sysctls below the node.)
Sponsored by: Dell EMC Isilon
Notes:
svn path=/head/; revision=351159
|