aboutsummaryrefslogtreecommitdiff
path: root/sys/netinet/in_var.h
Commit message (Collapse)AuthorAgeFilesLines
* Simplify dom_<rtattach|rtdetach>.Alexander V. Chernikov2020-08-141-2/+5
| | | | | | | | | | | | | | Remove unused arguments from dom_rtattach/dom_rtdetach functions and make them return/accept 'struct rib_head' instead of 'void **'. Declare inet/inet6 implementations in the relevant _var.h headers similar to domifattach / domifdetach. Add rib_subscribe_internal() function to accept subscriptions to the rnh directly. Differential Revision: https://reviews.freebsd.org/D26053 Notes: svn path=/head/; revision=364238
* Make sure the multicast release tasks are properly drained whenHans Petter Selasky2020-08-101-0/+1
| | | | | | | | | | | | | | | | destroying a VNET or a network interface. Else the inm release tasks, both IPv4 and IPv6 may cause a panic accessing a freed VNET or network interface. Reviewed by: jmg@ Discussed with: bz@ Differential Revision: https://reviews.freebsd.org/D24914 MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=364073
* Eliminate now-unused parts of old routing KPI.Alexander V. Chernikov2020-04-281-3/+0
| | | | | | | | | | | | r360292 switched most of the remaining routing customers to a new KPI, leaving a bunch of wrappers for old routing lookup functions unused. Remove them from the tree as a part of routing cleanup. Differential Revision: https://reviews.freebsd.org/D24569 Notes: svn path=/head/; revision=360430
* Bring back redirect route expiration.Alexander V. Chernikov2020-01-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redirect (and temporal) route expiration was broken a while ago. This change brings route expiration back, with unified IPv4/IPv6 handling code. It introduces net.inet.icmp.redirtimeout sysctl, allowing to set an expiration time for redirected routes. It defaults to 10 minutes, analogues with net.inet6.icmp6.redirtimeout. Implementation uses separate file, route_temporal.c, as route.c is already bloated with tons of different functions. Internally, expiration is implemented as an per-rnh callout scheduled when route with non-zero rt_expire time is added or rt_expire is changed. It does not add any overhead when no temporal routes are present. Callout traverses entire routing tree under wlock, scheduling expired routes for deletion and calculating the next time it needs to be run. The rationale for such implemention is the following: typically workloads requiring large amount of routes have redirects turned off already, while the systems with small amount of routes will not inhibit large overhead during tree traversal. This changes also fixes netstat -rn display of route expiration time, which has been broken since the conversion from kread() to sysctl. Reviewed by: bz MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D23075 Notes: svn path=/head/; revision=356984
* Widen NET_EPOCH coverage.Gleb Smirnoff2019-10-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111 Notes: svn path=/head/; revision=353292
* Convert all IPv4 and IPv6 multicast memberships into using a STAILQHans Petter Selasky2019-06-251-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instead of a linear array. The multicast memberships for the inpcb structure are protected by a non-sleepable lock, INP_WLOCK(), which needs to be dropped when calling the underlying possibly sleeping if_ioctl() method. When using a linear array to keep track of multicast memberships, the computed memory location of the multicast filter may suddenly change, due to concurrent insertion or removal of elements in the linear array. This in turn leads to various invalid memory access issues and kernel panics. To avoid this problem, put all multicast memberships on a STAILQ based list. Then the memory location of the IPv4 and IPv6 multicast filters become fixed during their lifetime and use after free and memory leak issues are easier to track, for example by: vmstat -m | grep multi All list manipulation has been factored into inline functions including some macros, to easily allow for a future hash-list implementation, if needed. This patch has been tested by pho@ . Differential Revision: https://reviews.freebsd.org/D20080 Reviewed by: markj @ MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=349369
* UDP: further performance improvements on txMatt Macy2018-05-231-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409 Notes: svn path=/head/; revision=334118
* ifnet: Replace if_addr_lock rwlock with epoch + mutexMatt Macy2018-05-181-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366 Notes: svn path=/head/; revision=333813
* r333175 introduced deferred deletion of multicast addresses in order to ↵Matt Macy2018-05-061-0/+4
| | | | | | | | | | | | | | | | | | permit the driver ioctl to sleep on commands to the NIC when updating multicast filters. More generally this permitted driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a a multicast update would still be queued for deletion when ifconfig deleted the interface thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses on the interface. Synchronously remove all external references to a multicast address before enqueueing for delete. Reported by: lwhsu Approved by: sbruno Notes: svn path=/head/; revision=333309
* Separate list manipulation locking from state change in multicastStephen Hurd2018-05-021-10/+38
| | | | | | | | | | | | | | Multicast incorrectly calls in to drivers with a mutex held causing drivers to have to go through all manner of contortions to use a non sleepable lock. Serialize multicast updates instead. Submitted by: mmacy <mmacy@mattmacy.io> Reviewed by: shurd, sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14969 Notes: svn path=/head/; revision=333175
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* Add GARP retransmit capabilityEric van Gyzen2016-10-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A single gratuitous ARP (GARP) is always transmitted when an IPv4 address is added to an interface, and that is usually sufficient. However, in some circumstances, such as when a shared address is passed between cluster nodes, this single GARP may occasionally be dropped or lost. This can lead to neighbors on the network link working with a stale ARP cache and sending packets destined for that address to the node that previously owned the address, which may not respond. To avoid this situation, GARP retransmissions can be enabled by setting the net.link.ether.inet.garp_rexmit_count sysctl to a value greater than zero. The setting represents the maximum number of retransmissions. The interval between retransmissions is calculated using an exponential backoff algorithm, doubling each time, so the retransmission intervals are: {1, 2, 4, 8, 16, ...} (seconds). Due to the exponential backoff algorithm used for the interval between GARP retransmissions, the maximum number of retransmissions is limited to 16 for sanity. This limit corresponds to a maximum interval between retransmissions of 2^16 seconds ~= 18 hours. Increasing this limit is possible, but sending out GARPs spaced days apart would be of little use. Submitted by: David A. Bright <david.a.bright@dell.com> MFC after: 1 month Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D7695 Notes: svn path=/head/; revision=306577
* Get closer to a VIMAGE network stack teardown from top to bottom ratherBjoern A. Zeeb2016-06-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747 Notes: svn path=/head/; revision=302054
* MFP r287070,r287073: split radix implementation and route table structure.Alexander V. Chernikov2016-01-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon. Notes: svn path=/head/; revision=294706
* Remove now-unused wrappers for various routing functions.Alexander V. Chernikov2016-01-141-2/+0
| | | | Notes: svn path=/head/; revision=293886
* Revert r292275 & r292379Steven Hartland2015-12-171-3/+0
| | | | | | | | | | glebius has concerns about these changes so reverting those can be discussed and addressed. Sponsored by: Multiplay Notes: svn path=/head/; revision=292402
* Fix lagg failover due to missing notificationsSteven Hartland2015-12-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the address may have moved. This results is slow failover, dropped packets and network outages for the lagg interface when the primary link goes down. We now use the new if_link_state_change_cond with the force param set to allow lagg to force through link state changes and hence fire a ifnet_link_event which are now monitored by rip and nd6. Upon receiving these events each protocol trigger the relevant notifications: * inet4 => Gratuitous ARP * inet6 => Unsolicited Neighbour Announce This also fixes the carp IPv6 NA's that stopped working after r251584 which added the ipv6_route__llma route. The new behavour can be controlled using the sysctls: * net.link.ether.inet.arp_on_link * net.inet6.icmp6.nd6_on_link Also removed unused param from lagg_port_state and added descriptions for the sysctls while here. PR: 156226 MFC after: 1 month Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4111 Notes: svn path=/head/; revision=292275
* Replace the fastforward path with tryforward which does not require aGeorge V. Neville-Neil2015-11-051-1/+1
| | | | | | | | | | | | | | sysctl and will always be on. The former split between default and fast forwarding is removed by this commit while preserving the ability to use all network stack features. Differential Revision: https://reviews.freebsd.org/D4042 Reviewed by: ae, melifaro, olivier, rwatson MFC after: 1 month Sponsored by: Rubicon Communications (Netgate) Notes: svn path=/head/; revision=290383
* Remove several compat functions from pre-fib era.Alexander V. Chernikov2015-10-171-2/+0
| | | | Notes: svn path=/head/; revision=289461
* Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.Andrey V. Elsukov2015-07-291-11/+12
| | | | | | | | | | | | | | Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149 Notes: svn path=/head/; revision=286001
* Move all code related to IP fragment reassembly to ip_reass.c. SomeGleb Smirnoff2015-04-101-9/+0
| | | | | | | | | | | | function names have changed and comments are reformatted or added, but there is no functional change. Claim copyright for me and Adrian. Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=281351
* The last userland piece of in_var.h is now 'struct in_aliasreq'. MoveGleb Smirnoff2015-02-191-12/+13
| | | | | | | it to the top of the file, and ifdef _KERNEL the rest. Notes: svn path=/head/; revision=279031
* Now that all users of _WANT_IFADDR are fixed, remove this crutch andGleb Smirnoff2015-02-191-2/+2
| | | | | | | | | | hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=279030
* - Rename 'struct igmp_ifinfo' into 'struct igmp_ifsoftc', since it reallyGleb Smirnoff2015-02-191-25/+3
| | | | | | | | | | | | | | | | represents a context. - Preserve name 'struct igmp_ifinfo' for a new structure, that will be stable API between userland and kernel. - Make sysctl_igmp_ifinfo() return the new 'struct igmp_ifinfo', instead of old one, which had a bunch of internal kernel structures in it. - Move all above declarations from in_var.h to igmp_var.h, since they are private to IGMP code. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=279026
* Widen _KERNEL ifdef to hide more kernel network stack structures from userland.Gleb Smirnoff2015-02-191-9/+0
| | | | Notes: svn path=/head/; revision=278987
* Use new struct mbufq instead of struct ifqueue to manage packet queues inGleb Smirnoff2015-02-191-2/+2
| | | | | | | | | | IPv4 multicast code. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=278978
* Kill custom in_matroute() radix mathing function removing one rte mutex lock.Alexander V. Chernikov2014-11-111-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initially in_matrote() in_clsroute() in their current state was introduced by r4105 20 years ago. Instead of deleting inactive routes immediately, we kept them in route table, setting RTPRF_OURS flag and some expire time. After that, either GC came or RTPRF_OURS got removed on first-packet. It was a good solution in that days (and probably another decade after that) to keep TCP metrics. However, after moving metrics to TCP hostcache in r122922, most of in_rmx functionality became unused. It might had been used for flushing icmp-originated routes before rte mutexes/refcounting, but I'm not sure about that. So it looks like this is nearly impossible to make GC do its work nowadays: in_rtkill() ignores non-RTPRF_OURS routes. route can only become RTPRF_OURS after dropping last reference via rtfree() which calls in_clsroute(), which, it turn, ignores UP and non-RTF_DYNAMIC routes. Dynamic routes can still be installed via received redirect, but they have default lifetime (no specific rt_expire) and no one has another trie walker to call RTFREE() on them. So, the changelist: * remove custom rnh_match / rnh_close matching function. * remove all GC functions * partially revert r256695 (proto3 is no more used inside kernel, it is not possible to use rt_expire from user point of view, proto3 support is not complete) * Finish r241884 (similar to this commit) and remove remaining IPv6 parts MFC after: 1 month Notes: svn path=/head/; revision=274363
* Update the IPv4 input path to handle reassembled frames and incoming framesAdrian Chadd2014-09-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | with no RSS hash. When doing RSS: * Create a new IPv4 netisr which expects the frames to have been verified; it just directly dispatches to the IPv4 input path. * Once IPv4 reassembly is done, re-calculate the RSS hash with the new IP and L3 header; then reinject it as appropriate. * Update the IPv4 netisr to be a CPU affinity netisr with the RSS hash function (rss_soft_m2cpuid) - this will do a software hash if the hardware doesn't provide one. NICs that don't implement hardware RSS hashing will now benefit from RSS distribution - it'll inject into the correct destination netisr. Note: the netisr distribution doesn't work out of the box - netisr doesn't query RSS for how many CPUs and the affinity setup. Yes, netisr likely shouldn't really be doing CPU stuff anymore and should be "some kind of 'thing' that is a workqueue that may or may not have any CPU affinity"; that's for a later commit. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan Notes: svn path=/head/; revision=271300
* in_ifadown() can be void.Gleb Smirnoff2013-11-011-1/+1
| | | | Notes: svn path=/head/; revision=257500
* Cleanup in_ifscrub(), which is just an entry to in_scrubprefix().Gleb Smirnoff2013-11-011-1/+0
| | | | Notes: svn path=/head/; revision=257499
* Uninline inm_lookup_locked(). Now in_var.h doesn't dereferenceGleb Smirnoff2013-10-291-43/+2
| | | | | | | | | | fields of struct ifnet. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=257325
* Hide 'struct ifaddr' definition from userland. Two tools left that use it,Gleb Smirnoff2013-10-151-0/+2
| | | | | | | | | | | namely ipftest(1) and ifmcstat(1). These sniff structure definition using _WANT_IFADDR define. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=256518
* Back out r249318, r249320 and r249327 due to a heisenbug mostAndre Oppermann2013-05-061-1/+1
| | | | | | | | likely related to a race condition in the ipi_hash_lock with the exact cause currently unknown but under investigation. Notes: svn path=/head/; revision=250300
* Fix build.Gleb Smirnoff2013-04-101-1/+1
| | | | Notes: svn path=/head/; revision=249327
* Do not check if found IPv4 rte is dynamic if net.inet.icmp.drop_redirect isAlexander V. Chernikov2012-10-101-0/+2
| | | | | | | | | | | | | | | | | | | | | enabled. This eliminates one mtx_lock() per each routing lookup thus improving performance in several cases (routing to directly connected interface or routing to default gateway). Icmp redirects should not be used to provide routing direction nowadays, even for end hosts. Routers should not use them too (and this is explicitly restricted in IPv6, see RFC 4861, clause 8.2). Current commit changes rnh_machaddr function to 'stock' rn_match (and back) for every AF_INET routing table in given VNET instance on drop_redirect sysctl change. This change is part of bigger patch eliminating rte locking. Sponsored by: Yandex LLC MFC after: 2 weeks Notes: svn path=/head/; revision=241406
* When traversing global in_ifaddr list in the IFP_TO_IA() macro, we needGleb Smirnoff2012-07-181-2/+4
| | | | | | | to obtain IN_IFADDR_RLOCK(). Notes: svn path=/head/; revision=238572
* Provide IA_MASKSIN() macro similar to IA_SIN() and IA_DSTSIN().Gleb Smirnoff2012-01-081-0/+1
| | | | Notes: svn path=/head/; revision=229815
* Convert all users of IF_ADDR_LOCK to use new locking macros that specifyJohn Baldwin2012-01-051-2/+2
| | | | | | | | | | either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks Notes: svn path=/head/; revision=229621
* A major overhaul of the CARP implementation. The ip_carp.c was startedGleb Smirnoff2011-12-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1] Notes: svn path=/head/; revision=228571
* Remove last remnants of classful addressing:Gleb Smirnoff2011-10-151-6/+3
| | | | | | | | | | | | | | | - Remove ia_net, ia_netmask, ia_netbroadcast from struct in_ifaddr. - Remove net.inet.ip.subnetsarelocal, I bet no one need it in 2011. - fix bug when we were not forwarding to a host which matches classful net address. For example router having 192.168.x.y/16 network attached, would not forward traffic to 192.168.*.0, which are legal IPs in CIDR world. - For compatibility, leave autoguessing of mask based on class. Reviewed by: andre, bz, rwatson Notes: svn path=/head/; revision=226401
* The statically configured (permanent) ARP entries are removed when anQing Li2011-05-201-1/+1
| | | | | | | | | | | | | interface is brought down, even though the interface address is still valid. This patch maintains the permanent ARP entries as long as the interface address (having the same prefix as that of the ARP entries) is valid. Reviewed by: delphij MFC after: 5 days Notes: svn path=/head/; revision=222143
* Remove unused VNET_SET() and related macros; only VNET_GET() isRobert Watson2009-07-161-3/+3
| | | | | | | | | | | | ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib) Notes: svn path=/head/; revision=195727
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorRobert Watson2009-07-141-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith) Notes: svn path=/head/; revision=195699
* Initialize in_ifaddr_lock using RW_SYSINIT() instead of in ip_init(),Robert Watson2009-06-251-1/+0
| | | | | | | | | | so that it doesn't run multiple times if VIMAGE is being used. Discussed with: bz MFC after: 6 weeks Notes: svn path=/head/; revision=194962
* Add a new global rwlock, in_ifaddr_lock, which will synchronize use of theRobert Watson2009-06-251-0/+11
| | | | | | | | | | | | | | | | | | | | | | in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz Notes: svn path=/head/; revision=194951
* Modify most routines returning 'struct ifaddr *' to return referencesRobert Watson2009-06-231-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions) Notes: svn path=/head/; revision=194760
* Remove bogus comment.Warner Losh2009-05-091-1/+1
| | | | Notes: svn path=/head/; revision=191943
* make LLTABLE visible to netinetKip Macy2009-04-151-0/+2
| | | | Notes: svn path=/head/; revision=191120
* Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSDBruce M Simpson2009-03-091-107/+223
| | | | | | | | | | | | | | IPv4 stack. Diffs are minimized against p4. PCS has been used for some protocol verification, more widespread testing of recorded sources in Group-and-Source queries is needed. sizeof(struct igmpstat) has changed. __FreeBSD_version is bumped to 800070. Notes: svn path=/head/; revision=189592