| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow protocol layers to look up an inpcb belonging to a particular FIB.
This is indicated by setting INPLOOKUP_FIB; if it is set, the FIB to be
used is obtained from the specificed mbuf or ifnet.
No functional change intended.
Reviewed by: glebius, melifaro
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48662
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a flag, INPBIND_FIB, which means that the inpcb is local to its FIB
number. When this flag is specified, duplicate bindings are permitted,
so long as each FIB contains at most one inpcb bound to the same
address/port. If an inpcb is bound with this flag, it'll have the
INP_BOUNDFIB flag set.
No functional change intended.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48661
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to enable a mode where duplicate inpcb bindings are permitted,
and we want to look up an inpcb with a particular FIB. Thus, add a
"fib" parameter to in_pcblookup() and related functions, and plumb it
through.
A fib value of RT_ALL_FIBS indicates that the lookup should ignore FIB
numbers when searching. Otherwise, it should refer to a valid FIB
number, and the returned inpcb should belong to the specific FIB. For
now, just add the fib parameter where needed, as there are several
layers to plumb through.
No functional change intended.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48660
|
| |
|
|
| |
Trust the framework to set the generic methods. No functional change.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To comply with Common Criteria certification requirements, it may be
necessary to ensure that packets to 0.0.0.0/::0 are dropped and logged
by the system firewall. Currently, such packets are dropped by
ip_input() and ip6_input() before reaching pfil hooks; let's defer the
checks slightly to give firewalls a chance to drop the packets
themselves, as this gives better observability. Add some regression
tests for this with pf+pflog.
Note that prior to commit 713264f6b8b, v4 packets to the unspecified
address were not dropped by the IP stack at all.
Note that ip_forward() and ip6_forward() ensure that such packets are
not forwarded; they are passed back unmodified.
Add a regression test which ensures that such packets are visible to
pflog.
Reviewed by: glebius
MFC after: 3 weeks
Sponsored by: Klara, Inc.
Sponsored by: OPNsense
Differential Revision: https://reviews.freebsd.org/D48163
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Only map mbuf when a policy is looked up and indicates that IPSEC needs
to transform the packet. If IPSEC is inline offloaded, it is up to the
interface driver to request remap if needed.
Fetch the IP header using m_copydata() instead of using mtod() to select
policy/SA.
Reviewed by: markj
Sponsored by: NVidia networking
Differential revision: https://reviews.freebsd.org/D48265
|
| |
|
|
|
|
|
|
|
|
|
| |
but instead of tripping the assert in debug kernel, and silently falling
into UB for prod, skip IPSEC processing for KTLS framed packets when
mb_unmapped_to_ext() failed.
Reviewed by: markj
Sponsored by: NVidia networking
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D48265
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See commit 4f02a7d739b3 for more background.
I cannot see a good reason to continue ignoring mismatching UIDs when
binding to INADDR_ANY. Looking at the sdr.V2.4a7n sources (mentioned in
bugzilla PR 7713), there is a CANT_MCAST_BIND hack wherein the
application binds to INADDR_ANY instead of a multicast address, but
CANT_MCAST_BIND isn't defined for FreeBSD builds.
It seems unlikely that we still have a use-case for allowing sockets
from different UIDs to bind to the same port when binding to the
unspecified address. And, as noted in D47832, applications like sdr
would have been broken by the inverted SO_REUSEPORT check removed in
that revision, apparently without any bug reports. Let's break
compatibility and simply disallow this case outright.
Also, add some comments, remove a hack in a regression test which tests
this funtionality, and add a new regression test to exercise the
remaining checks that were added in commit 4658dc8325e03.
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47870
|
| |
|
|
| |
Sponsored by: NVidia networking
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Actually check the conditions that are enforced by the error checking
code instead of a condition which is
* checking a number to be non-negative instead of positive
* depending on a random number
Perform the checks consistently for ICMPv4 and ICMPv6.
Reviewed by: glebius, rrs, cc
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D48001
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This check for SO_REUSEPORT was added way back in commit 52b65dbe85faf.
Per the commit log, this commit restricted this port-stealing check to
unicast addresses, and then only if the existing socket does not have
SO_REUSEPORT set. In other words, if there exists a socket bound to
INADDR_ANY, and we bind a socket to INADDR_ANY with the same port, then
the two sockets need not be owned by the same user if the existing
socket has SO_REUSEPORT set.
This is a surprising semantic; bugzilla PR 7713 gives some additional
context. That PR makes a case for the behaviour described above when
binding to a multicast address. But, the SO_REUSEPORT check is only
applied when binding to a non-multicast address, so it doesn't really
make sense. In the PR the committer notes that "unicast applications
don't set SO_REUSEPORT", which makes some sense, but also refers to
"multicast applications that bind to INADDR_ANY", which sounds a bit
suspicious.
OpenBSD performs the multicast check, but not the SO_REUSEPORT check.
DragonflyBSD removed the SO_REUSEPORT (and INADDR_ANY) checks back in
2014 (commit 0323d5fde12a4). NetBSD explicitly copied our logic and
still has it.
The plot thickens: 20 years later, SO_REUSEPORT_LB was ported from
DragonflyBSD: this option provides similar semantics to SO_REUSEPORT,
but for unicast addresses it causes incoming connections/datagrams to be
distributed among all sockets in the group. This commit (1a43cff92a20d)
inverted the check for SO_REUSEPORT while adding one for
SO_REUSEPORT_LB; this appears to have been inadvertent. However:
- apparently no one has noticed that the semantics were changed;
- sockets belonging to different users can now be bound to the same port
so long as they belong to a single lbgroup bound to INADDR_ANY, which
is not correct.
Simply remove the SO_REUSEPORT(_LB) checks, as their original
justification was dubious and their current implementation is wrong; add
some tests.
Reviewed by: glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47832
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For a long time, the inpcb lookup path has been lockless in the common
case: we use net_epoch to synchronize lookups. However, the routines
which update lbgroups were not careful to synchronize with unlocked
lookups. I believe that in the worst case this can result in spurious
connection aborts (I have a regression test case to exercise this), but
it's hard to be certain.
Modify in_pcblbgroup* routines to synchronize with unlocked lookup:
- When removing inpcbs from an lbgroup, do not shrink the array.
The maximum number of lbgroup entries is INPCBLBGROUP_SIZMAX (256),
and it doesn't seem worth the complexity to shrink the array when a
socket is removed.
- When resizing an lbgroup, do not insert it into the hash table until
it is fully initialized; otherwise lookups may observe a partially
constructed lbgroup.
- When adding an inpcb to the group, increment the counter after adding
the array entry, using a release store. Otherwise it's possible for
lookups to observe a null array slot.
- When looking up an entry, use a corresponding acquire load.
Reviewed by: ae, glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48020
|
| | |
|
| |
|
|
| |
Fixes: 01f8ce83242d ("inpcb: Factor out parts of in6_pcbbind() and in_pcbbind_setup()")
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A large portion of these functions just determines whether the inpcb can
bind to the address/port. This portion has no side effects, so is a
good candidate to move into its own helper function. This patch does
so, making the callers less complicated and reducing indentation.
While moving this code, also make some changes:
- Load socket options (SO_REUSEADDR etc.) only once. There is nothing
preventing another thread from toggling the socket options, so make
this function easier to reason about by avoiding races.
- When checking whether the bind address is an interface address, make a
separate sockaddr rather than temporarily modifying the one passed to
in_pcbbind().
Reviewed by: ae, glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47590
|
| |
|
|
|
|
|
|
| |
No functional change intended.
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
|
| |
|
|
|
|
|
|
| |
No functional change intended.
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
|
| | |
|
| |
|
|
| |
There is no point in doing that when we operate on a particular inpcb.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Use the local var "laddr" instead of sin->sin_addr in one block.
- Use in_nullhost() instead of explicit comparisons with INADDR_ANY.
- Combine multiple socket options checks into one.
- Fix indentation.
- Remove some unhelpful comments.
This is in preparation for some simplification and bug-fixing.
No functional change intended.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47451
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
in_pcblookup_hash_wild_* looks up unconnected inpcbs, so there is no
point in passing the foreign address and port, and indeed those
parameters are not used. So, remove them.
No functional change intended.
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47385
|
| |
|
|
|
|
|
|
| |
Found while auditing calls to M_WRITABLE to see if M_EXTPG could be
removed from its checks.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D46785
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the V_connect_ifaddr_wild sysctl says that we shouldn't infer a
destination address, return an error. Otherwise it's possible for use
of an unspecified foreign address to trigger a subsequent assertion
failure, for example in in_pcblookup_hash_locked().
Similarly, if no interface addresses are assigned, fail quickly upon an
attempt to connect to the unspecified address.
Reported by: Shawn Webb <shawn.webb@hardenedbsd.org>
MFC after: 2 weeks
Reviewed by: zlei, allanjude, emaste
Differential Revision: https://reviews.freebsd.org/D46454
|
| |
|
|
|
|
|
|
| |
See the discussion in Bugzilla PR 280705 for context.
PR: 280705
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D46259
|
| |
|
|
|
|
| |
This reverts commit 71867653008ce17a66a9c935e9dc29c1320bf48b.
Two tests of the test suite are failing. Reverting the change
until it is improved.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inp_route pointer should only be provided to the network
layer, when no destination address is provided. This is only
one of the conditions, where a write lock is needed.
If, for example, the route is also cached, when the socket is
unbound, problems show up, when the sendto is called, then
connect and finally send, when the route for the addresses
provided in the sendto and connect call use different outgoing
interfaces.
While there, clearly document why the write lock is taken.
Reported by: syzbot+59122d2e848087d3355a@syzkaller.appspotmail.com
Reviewed by: Peter Lei, glebius
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46056
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The nd6 code listens for RTM_DELETE events so that it can mark the
corresponding default router as inactive in the case where the default
route is deleted. A subsequent RA from the router may then reinstall
the default route.
Commit fedeb08b6a58e broke this for non-multipath routes, as
rib_decompose_notification() only invokes the callback for multipath
routes. Restore the old behaviour. Also ensure that we update the
router only for RTM_DELETE notifications, lost in commit 2259a03020fe0.
Reviewed by: bz
Fixes: fedeb08b6a58 ("Introduce scalable route multipath.")
Fixes: 2259a03020fe ("Rework part of routing code to reduce difference to D26449.")
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Bell Tower Integration
Differential Revision: https://reviews.freebsd.org/D46020
|
| |
|
|
|
|
|
|
|
| |
- s/collasped/collapsed/
- s/defininitions/definitions/
- s/optionaly/optionally/
Obtained from: NetBSD
MFC after: 3 days
|
| |
|
|
|
|
|
|
| |
To be able to pass ifp and mtu to the ipsec_output() and ipsec
accelerator filter.
Sponsored by: NVIDIA networking
Differential revision: https://reviews.freebsd.org/D44225
|
| |
|
|
|
|
|
| |
Similarly, mtu is needed to decide inline IPSEC offloiad for the driver.
Sponsored by: NVIDIA networking
Differential revision: https://reviews.freebsd.org/D44224
|
| |
|
|
|
|
|
|
| |
The information about the interface is needed to coordinate inline
offloading of IPSEC processing with corresponding driver.
Sponsored by: NVIDIA networking
Differential revision: https://reviews.freebsd.org/D44223
|
| | |
|
| |
|
|
|
|
| |
The code deleted predates FreeBSD history. The comment deleted is 99%
outdated. Why KAME decided to use these constants instead of normal ones
also lost in centuries.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only element of of in6_addr that is specified in RFC 3493 or
in POSIX.1-2017 is s6_addr, implemented via a #define to a union
member. However, FreeBSD and other BSD systems have additional
definitions for the other union members, s6_addr{8,16,32} which
are defined for the kernel and loader. Some Linux applications
also use them, and they seem to be allowed by the RFC and POSIX.
Remove the current ifdefs, exposing the additional fields to user
level, and replace with #if __BSD_VISIBLE. Add an explanatory
comment expanding on the previous "nonstandard" comment.
MFC after: 1 week
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D44979
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in6_mapped_sockaddr() and in6_mapped_peeraddr() both define a local
variable named 'inp', but in the non-INET case, this variable is set
and never used, causing a compiler error:
/src/freebsd/src/lf/sys/netinet6/in6_pcb.c:547:16: error:
variable 'inp' set but not used [-Werror,-Wunused-but-set-variable]
547 | struct inpcb *inp;
| ^
/src/freebsd/src/lf/sys/netinet6/in6_pcb.c:573:16: error:
variable 'inp' set but not used [-Werror,-Wunused-but-set-variable]
573 | struct inpcb *inp;
Fix this by guarding all the INET-specific logic, including the variable
definition, behind #ifdef INET.
While here, tweak formatting in in6_mapped_peeraddr() so both functions
are the same.
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1155
|
| |
|
|
| |
Fixes: 4399e055ea610cdefa1470ad1ee614dd81ba5e56
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When debugging network issues one common clue is an unexpectedly
incrementing error counter. This is helpful, in that it gives us an
idea of what might be going wrong, but often these counters may be
incremented in different functions.
Add a static probe point for them so that we can use dtrace to get
futher information (e.g. a stack trace).
For example:
dtrace -n 'mib:ip:count: { printf("%d", arg0); stack(); }'
This can be disabled by setting the following kernel option:
options KDTRACE_NO_MIB_SDT
Reviewed by: gallatin, tuexen (previous version), gnn (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43504
|
| |
|
|
|
|
|
|
| |
Zero means limit is disabled, so the value doesn't need to be checked
against jitter value.
Fixes: ac44739fd834f51cacb26485a4140fd482e20150
Fixes: a03aff88a14448c3084a0384082ec996d7213897
|
| |
|
|
|
|
|
|
|
| |
Use counter_ratecheck() instead of racy and slow ppsratecheck. Use a
separate counter for every currently known type of ICMPv6. Provide logging
of ratelimit events. Provide jitter to counter open UDP port detection.
Reviewed by: tuexen, zlei
Differential Revision: https://reviews.freebsd.org/D44482
|
| |
|
|
|
|
|
|
|
| |
Most of them can be declared as static after the move out of in6_proto.c.
Keeping sysctl(9) declarations with their text descriptions next to the
variable declaration create self-documenting code. There should be no
functional changes.
Differential Revision: https://reviews.freebsd.org/D44481
|
| |
|
|
|
|
|
|
| |
The generation of ICMP6_ECHO_REPLY bypasses icmp6_error(), thus rate
limit was not applied.
Reviewed by: tuexen, zlei
Differential Revision: https://reviews.freebsd.org/D44480
|
| |
|
|
|
| |
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D44479
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When profiling an IP6 heavy workload, I noticed that we were
getting a lot of cache misses in ip6_output() around
ip6_pktopts. This was happening because the TCP stack passes
inp->in6p_outputopts even if all options are unused. So in the
common case of no options present, pkt_opts is not null, and is
checked repeatedly for different options. Since ip6_pktopts is
large (4 cachelines), and every field is checked, we take 4
cache misses (2 of which tend to be hidden by the adjacent line
prefetcher).
To fix this common case, I introduced a new flag in ip6_pktopts
(ip6po_valid) which tracks which options have been set. In the
common case where nothing is set, this causes just a single
cache miss to load. It also eliminates a test for some options
(if (opt != NULL && opt->val >= const) vs if ((optvalid & flag) !=0 )
To keep the struct the same size in 64-bit kernels, and to keep
the integer values (like ip6po_hlim, ip6po_tclass, etc) on the
same cacheline, I moved them to the top.
As suggested by zlei, the null check in MAKE_EXTHDR() becomes
redundant, and can be removed.
For our web server workload (with the ip6po_tclass option set),
this drops the CPI from 2.9 to 2.4 for ip6_output
Differential Revision: https://reviews.freebsd.org/D44204
Reviewed by: bz, glebius, zlei
No Objection from: melifaro
Sponsored by: Netflix Inc.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't report a BACKUP CARP address as local. These two functions are used
only by source address validation for input packets, controlled by sysctls
net.inet.ip.source_address_validation and
net.inet6.ip6.source_address_validation. For this purpose we definitely
want to treat BACKUP addresses as non local.
This change is conservative and doesn't modify compat in_localip() and
in6_localip(). They are used more widely than the FIB-aware versions.
The change would modify the notion of ipfw(4) 'me' keyword. There might
be other consequences as in_localip() is used by various tunneling
protocols.
PR: 277349
|
| |
|
|
|
|
|
| |
Provide a comment in sorflush() why the socket I/O sx(9) lock is actually
important.
This reverts commit 507f87a799cf0811ce30f0ae7f10ba19b2fd3db3.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed
to them. (E.g. when rejecting a packet, or when gathering up packets
for reassembly).
If the hook returns PFIL_PASS the mbuf must still be present. Assert
this in pfil_mem_common() and ensure that ipfilter follows this
convention. pf and ipfw already did.
Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf
must have been freed (or now be owned by the firewall for further
processing, like packet scheduling or reassembly).
This allows us to remove a few extraneous NULL checks.
Suggested by: tuexen
Reviewed by: tuexen, zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43617
|
| |
|
|
| |
MFC after: 1 week
|
| |
|
|
|
|
| |
- s/adddress/address/
MFC after: 3 days
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch provides UDP encapsulation of ESP packets over IPv6.
Ports the IPv4 code to IPv6 and adds support for IPv6 in udpencap.c
As required by the RFC and unlike in IPv4 encapsulation,
UDP checksums are calculated.
Co-authored-by: Aurelien Cazuc <aurelien.cazuc.external@stormshield.eu>
Sponsored-by: Stormshield
Sponsored-by: Wiktel
Sponsored-by: Klara, Inc.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With removal of dom_dispose method the function boils down to two
meaningful function calls: socantrcvmore() and sbrelease(). The latter is
only relevant for protocols that use generic socket buffers.
The socket I/O sx(9) lock acquisition in sorflush() is not relevant for
shutdown(2) operation as it doesn't do any I/O that may interleave with
read(2) or write(2). The socket buffer mutex acquisition inside
sbrelease() is what guarantees thread safety. This sx(9) acquisition in
soshutdown() can be tracked down to 4.4BSD times, where it used to be
sblock(), and it was carried over through the years evolving together with
sockets with no reconsideration of why do we carry it over. I can't tell
if that sblock() made sense back then, but it doesn't make any today.
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D43415
|