summaryrefslogtreecommitdiff
path: root/sys/netinet6/in6_rmx.c
Commit message (Collapse)AuthorAgeFilesLines
* Introduce scalable route multipath.Alexander V. Chernikov2020-10-031-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449 Notes: svn path=/head/; revision=366390
* net: clean up empty lines in .c and .h filesMateusz Guzik2020-09-011-1/+0
| | | | Notes: svn path=/head/; revision=365071
* Move net/route/shared.h definitions to net/route/route_var.h.Alexander V. Chernikov2020-08-281-1/+0
| | | | | | | | | | | | | | | | No functional changes. net/route/shared.h was created in the inital phases of nexthop conversion. It was intended to serve the same purpose as route_var.h - share definitions of functions and structures between the routing subsystem components. At that time route_var.h was included by many files external to the routing subsystem, which largerly defeats its purpose. As currently this is not the case anymore and amount of route_var.h includes is roughly the same as shared.h, retire the latter in favour of the former. Notes: svn path=/head/; revision=364941
* Simplify dom_<rtattach|rtdetach>.Alexander V. Chernikov2020-08-141-16/+8
| | | | | | | | | | | | | | Remove unused arguments from dom_rtattach/dom_rtdetach functions and make them return/accept 'struct rib_head' instead of 'void **'. Declare inet/inet6 implementations in the relevant _var.h headers similar to domifattach / domifdetach. Add rib_subscribe_internal() function to accept subscriptions to the rnh directly. Differential Revision: https://reviews.freebsd.org/D26053 Notes: svn path=/head/; revision=364238
* Fix rib_subscribe() waitok flag by performing allocation outside epoch.Alexander V. Chernikov2020-08-111-7/+4
| | | | | | | | | | Make in6_inithead() use rib_subscribe with waitok to achieve reliable subscription allocation. Reviewed by: glebius Notes: svn path=/head/; revision=364099
* Transition from rtrequest1_fib() to rib_action().Alexander V. Chernikov2020-07-211-11/+0
| | | | | | | | | | | | Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546 Notes: svn path=/head/; revision=363403
* Temporarly revert r363319 to unbreak the build.Alexander V. Chernikov2020-07-191-0/+11
| | | | | | | | Reported by: CI Pointy hat to: melifaro Notes: svn path=/head/; revision=363320
* Transition from rtrequest1_fib() to rib_action().Alexander V. Chernikov2020-07-191-11/+0
| | | | | | | | | | | | Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546 Notes: svn path=/head/; revision=363319
* Switch inet6 default route subscription to the new rib subscription api.Alexander V. Chernikov2020-07-121-0/+8
| | | | | | | | | | | Old subscription model allowed only single customer. Switch inet6 to the new subscription api and eliminate the old model. Differential Revision: https://reviews.freebsd.org/D25615 Notes: svn path=/head/; revision=363128
* * Add rib_<add|del|change>_route() functions to manipulate the routing table.Alexander V. Chernikov2020-06-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067 Notes: svn path=/head/; revision=361706
* Revert r361704, it accidentally committed merged D25067 and D25070.Alexander V. Chernikov2020-06-011-1/+0
| | | | Notes: svn path=/head/; revision=361705
* * Add rib_<add|del|change>_route() functions to manipulate the routing table.Alexander V. Chernikov2020-06-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067 Notes: svn path=/head/; revision=361704
* Remove now-unused rt_ifp,rt_ifa,rt_gateway,rt_mtu rte fields.Alexander V. Chernikov2020-05-041-52/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | After converting routing subsystem customers to use nexthop objects defined in r359823, some fields in struct rtentry became unused. This commit removes rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry along with the code initializing and updating these fields. Cleanup of the remaining fields will be addressed by D24669. This commit also changes the implementation of the RTM_CHANGE handling. Old implementation tried to perform the whole operation under radix WLOCK, resulting in slow performance and hacks like using RTF_RNH_LOCKED flag. New implementation looks up the route nexthop under radix RLOCK, creates new nexthop and tries to update rte nhop pointer. Only last part is done under WLOCK. In the hypothetical scenarious where multiple rtsock clients repeatedly issue RTM_CHANGE requests for the same route, route may get updated between read and update operation. This is addressed by retrying the operation multiple (3) times before returning failure back to the caller. Differential Revision: https://reviews.freebsd.org/D24666 Notes: svn path=/head/; revision=360629
* Move route_temporal.c and route_var.h to net/route.Alexander V. Chernikov2020-04-281-1/+1
| | | | | | | | | | | | Nexthop objects implementation, defined in r359823, introduced sys/net/route directory intended to hold all routing-related code. Move recently-introduced route_temporal.c and private route_var.h header there. Differential Revision: https://reviews.freebsd.org/D24597 Notes: svn path=/head/; revision=360449
* Eliminate now-unused parts of old routing KPI.Alexander V. Chernikov2020-04-281-20/+0
| | | | | | | | | | | | r360292 switched most of the remaining routing customers to a new KPI, leaving a bunch of wrappers for old routing lookup functions unused. Remove them from the tree as a part of routing cleanup. Differential Revision: https://reviews.freebsd.org/D24569 Notes: svn path=/head/; revision=360430
* Plug netmask NULL check during route addition causing kernel panic.Alexander V. Chernikov2020-04-141-1/+1
| | | | | | | | | This bug was introduced by the r359823. Reported by: hselasky Notes: svn path=/head/; revision=359917
* Introduce nexthop objects and new routing KPI.Alexander V. Chernikov2020-04-121-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the foundational change for the routing subsytem rearchitecture. More details and goals are available in https://reviews.freebsd.org/D24141 . This patch introduces concept of nexthop objects and new nexthop-based routing KPI. Nexthops are objects, containing all necessary information for performing the packet output decision. Output interface, mtu, flags, gw address goes there. For most of the cases, these objects will serve the same role as the struct rtentry is currently serving. Typically there will be low tens of such objects for the router even with multiple BGP full-views, as these objects will be shared between routing entries. This allows to store more information in the nexthop. New KPI: struct nhop_object *fib4_lookup(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, uint32_t flowid); struct nhop_object *fib6_lookup(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid, uint32_t flags, uint32_t flowid); These 2 function are intended to replace all all flavours of <in_|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous fib[46]-generation functions. Upon successful lookup, they return nexthop object which is guaranteed to exist within current NET_EPOCH. If longer lifetime is desired, one can specify NHR_REF as a flag and get a referenced version of the nexthop. Reference semantic closely resembles rtentry one, allowing sed-style conversion. Additionally, another 2 functions are introduced to support uRPF functionality inside variety of our firewalls. Their primary goal is to hide the multipath implementation details inside the routing subsystem, greatly simplifying firewalls implementation: int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, const struct ifnet *src_if); int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid, uint32_t flags, const struct ifnet *src_if); All functions have a separate scopeid argument, paving way to eliminating IPv6 scope embedding and allowing to support IPv4 link-locals in the future. Structure changes: * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size. * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz. Old KPI: During the transition state old and new KPI will coexists. As there are another 4-5 decent-sized conversion patches, it will probably take a couple of weeks. To support both KPIs, fields not required by the new KPI (most of rtentry) has to be kept, resulting in the temporary size increase. Once conversion is finished, rtentry will notably shrink. More details: * architectural overview: https://reviews.freebsd.org/D24141 * list of the next changes: https://reviews.freebsd.org/D24232 Reviewed by: ae,glebius(initial version) Differential Revision: https://reviews.freebsd.org/D24232 Notes: svn path=/head/; revision=359823
* Remove per-AF radix_mpath initializtion functions.Alexander V. Chernikov2020-04-111-0/+5
| | | | | | | | | | | Split their functionality by moving random seed allocation to SYSINIT and calling (new) generic multipath function from standard IPv4/IPv5 RIB init handlers. Differential Revision: https://reviews.freebsd.org/D24356 Notes: svn path=/head/; revision=359797
* Bring back redirect route expiration.Alexander V. Chernikov2020-01-221-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redirect (and temporal) route expiration was broken a while ago. This change brings route expiration back, with unified IPv4/IPv6 handling code. It introduces net.inet.icmp.redirtimeout sysctl, allowing to set an expiration time for redirected routes. It defaults to 10 minutes, analogues with net.inet6.icmp6.redirtimeout. Implementation uses separate file, route_temporal.c, as route.c is already bloated with tons of different functions. Internally, expiration is implemented as an per-rnh callout scheduled when route with non-zero rt_expire time is added or rt_expire is changed. It does not add any overhead when no temporal routes are present. Callout traverses entire routing tree under wlock, scheduling expired routes for deletion and calculating the next time it needs to be run. The rationale for such implemention is the following: typically workloads requiring large amount of routes have redirects turned off already, while the systems with small amount of routes will not inhibit large overhead during tree traversal. This changes also fixes netstat -rn display of route expiration time, which has been broken since the conversion from kread() to sysctl. Reviewed by: bz MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D23075 Notes: svn path=/head/; revision=356984
* Add fibnum, family and vnet pointer to each rib head.Alexander V. Chernikov2020-01-091-3/+4
| | | | | | | | | | | | | Having metadata such as fibnum or vnet in the struct rib_head is handy as it eases building functionality in the routing space. This change is required to properly bring back route redirect support. Reviewed by: bz MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D23047 Notes: svn path=/head/; revision=356559
* Remove useless code from in6_rmx.cAlexander V. Chernikov2019-12-181-63/+0
| | | | | | | | | | | | | | | | | | The code in questions walks IPv6 tree every 60 seconds and looks into the routes with non-zero expiration time (typically, redirected routes). For each such route it sets RTF_PROBEMTU flag at the expiration time. No other part of the kernel checks for RTF_PROBEMTU flag. RTF_PROBEMTU was defined 21 years ago, 30 Jun 1999, as RTF_PROTO1. RTF_PROTO1 is a de-facto standard indication of a route installed by a routing daemon for a last decade. Reviewed by: bz, ae MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22865 Notes: svn path=/head/; revision=355888
* Use the new VNET_DEFINE_STATIC macro when we are defining static VNETAndrew Turner2018-07-241-2/+2
| | | | | | | | | | | variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147 Notes: svn path=/head/; revision=336676
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Code duplication but rib_head is special. Not found an easy way to goBjoern A. Zeeb2016-02-031-1/+3
| | | | | | | | | | | back and harmize the use cases among RIB, IPFW, PF yet but it's also not the scope of this work. Prevents instant panics on teardown and frees the FIB bits again. Sponsored by: The FreeBSD Foundation Notes: svn path=/projects/vnet/; revision=295220
* MFP r287070,r287073: split radix implementation and route table structure.Alexander V. Chernikov2016-01-251-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon. Notes: svn path=/head/; revision=294706
* Remove prefix check from in6_addroute().Alexander V. Chernikov2016-01-091-29/+1
| | | | | | | | | | | | | | | | This check was added in initial? netinet6/ import back in 1999 (r53541). It effectively became unnecessary after 'address/prefix clean-ups' KAME commit 90ff8792e676132096a440dd787f99a5a5860ee8 (github) in 2001 (merged to FreeBSD in r78064) where prefix check was added to nd6_prefix_onlink(). Similar IPv4 check (in_addroute() was added in r137628). Additionally, the right plance for this (or similar) check is the prefix addition code (nd6_prefix_onlink(), nd6_prefix_onlink_rtrequest(), in_addprefix() or rtinit()), but not the generic radix insert routine. Notes: svn path=/head/; revision=293471
* Remove superfluous return (1) missed in r292601.Bjoern A. Zeeb2015-12-231-1/+0
| | | | | | | | | | | Reported by: Matthew D. Fuller (fullermd over-yonder.net), Kevin Bowling (kevin.bowling kev009.com) MFC after: 13 days X-MFC with: r292601 Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=292654
* Since r256624 we've been leaking routing table allocationsBjoern A. Zeeb2015-12-221-0/+1
| | | | | | | | | | | | | on vnet enabled jail shutdown. Call the provided cleanup routines for IP versions 4 and 6 to plug these leaks. Sponsored by: The FreeBSD Foundation MFC atfer: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4530 Notes: svn path=/head/; revision=292601
* Rename rt_foreach_fib() to rt_foreach_fib_walk().Alexander V. Chernikov2015-08-101-2/+3
| | | | | | | Suggested by: julian Notes: svn path=/head/; revision=286594
* MFP r274295:Alexander V. Chernikov2015-08-081-21/+9
| | | | | | | | | * Move interface route cleanup to route.c:rt_flushifroutes() * Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles. Notes: svn path=/head/; revision=286458
* CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than tenJung-uk Kim2015-05-221-1/+1
| | | | | | | | | | | | | years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks Notes: svn path=/head/; revision=283291
* Kill custom in_matroute() radix mathing function removing one rte mutex lock.Alexander V. Chernikov2014-11-111-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initially in_matrote() in_clsroute() in their current state was introduced by r4105 20 years ago. Instead of deleting inactive routes immediately, we kept them in route table, setting RTPRF_OURS flag and some expire time. After that, either GC came or RTPRF_OURS got removed on first-packet. It was a good solution in that days (and probably another decade after that) to keep TCP metrics. However, after moving metrics to TCP hostcache in r122922, most of in_rmx functionality became unused. It might had been used for flushing icmp-originated routes before rte mutexes/refcounting, but I'm not sure about that. So it looks like this is nearly impossible to make GC do its work nowadays: in_rtkill() ignores non-RTPRF_OURS routes. route can only become RTPRF_OURS after dropping last reference via rtfree() which calls in_clsroute(), which, it turn, ignores UP and non-RTF_DYNAMIC routes. Dynamic routes can still be installed via received redirect, but they have default lifetime (no specific rt_expire) and no one has another trie walker to call RTFREE() on them. So, the changelist: * remove custom rnh_match / rnh_close matching function. * remove all GC functions * partially revert r256695 (proto3 is no more used inside kernel, it is not possible to use rt_expire from user point of view, proto3 support is not complete) * Finish r241884 (similar to this commit) and remove remaining IPv6 parts MFC after: 1 month Notes: svn path=/head/; revision=274363
* Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.Gleb Smirnoff2014-11-071-1/+1
| | | | | | | Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=274225
* Make checks for rt_mtu generic:Alexander V. Chernikov2014-11-061-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking might be an option in some situation, it is not feasible to do MTU checks there: generic (or per-domain) routing code is perfectly capable of doing this. We currrently have 3 places where MTU is altered: 1) route addition. In this case domain overrides radix _addroute callback (in[6]_addroute) and all necessary checks/fixes are/can be done there. 2) route change (especially, GW change). In this case, there are no explicit per-domain calls, but one can override rte by setting ifa_rtrequest hook to domain handler (inet6 does this). 3) ifconfig ifaceX mtu YYYY In this case, we have no callbacks, but ip[6]_output performes runtime checks and decreases rt_mtu if necessary. Generally, the goals are to be able to handle all MTU changes in control plane, not in runtime part, and properly deal with increased interface MTU. This commit changes the following: * removes hooks setting MTU from drivers side * adds proper per-doman MTU checks for case 1) * adds generic MTU check for case 2) * The latter is done by using new dom_ifmtu callback since if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size. However, IPv6 mtu might be different from if_mtu one (e.g. default 1280) for some cases, so we need an abstract way to know maximum MTU size for given interface and domain. * moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies user-supplied data which must be checked. * removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to use this functions on new non-inserted rte. More changes will follow soon. MFC after: 1 month Sponsored by: Yandex LLC Notes: svn path=/head/; revision=274175
* Remove old hack abusing domattach from NFS code.Alexander V. Chernikov2014-11-051-9/+2
| | | | | | | | | | | | | | | | | | | | | | | According to IANA RPC uaddr registry, there are no AFs except IPv4 and IPv6, so it's not worth being too abstract here. Remove ne_rtable[AF_MAX+1] and use explicit per-AF radix tries. Use own initialization without relying on domattach code. While I admit that this was one of the rare places in kernel networking code which really was capable of doing multi-AF without any AF-depended code, it is not possible anymore to rely on dom* code. While here, change terrifying "Invalid radix node head, rn:" message, to different non-understandable "netcred already exists for given addr/mask", but less terrifying. Since we know that rn_addaddr() returns NULL if the same record already exists, we should provide more friendly error. MFC after: 1 month Notes: svn path=/head/; revision=274118
* Remove lock init from radix.c.Alexander V. Chernikov2014-10-011-1/+3
| | | | | | | | | Radix has never managed its locking itself. The only consumer using radix with embeded rwlock is system routing table. Move per-AF lock inits there. Notes: svn path=/head/; revision=272361
* - Remove rt_metrics_lite and simply put its members into rtentry.Gleb Smirnoff2014-03-051-6/+5
| | | | | | | | | | | | | | | | | | | - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=262763
* The r48589 promised to remove implicit inclusion of if_var.h soon. PrepareGleb Smirnoff2013-10-261-0/+1
| | | | | | | | | | | to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=257176
* Eliminate code checking if found IPv6 rte is dynamic. IPv6 redirectsAlexander V. Chernikov2012-10-221-156/+0
| | | | | | | | | | | | | | are using (different) ND-based approach described in RFC 4861. This change is similar to r241406 which conditionally skips the same check in IPv4. This change is part of bigger patch eliminating rte locking. Sponsored by: Yandex LLC. OK'd by: hrs MFC after: 2 weeks Notes: svn path=/head/; revision=241884
* Cleanup documentation: cloning route support has been removed in r186119.Alexander V. Chernikov2012-10-131-13/+0
| | | | | | | MFC after: 2 weeks Notes: svn path=/head/; revision=241502
* Add multi-FIB IPv6 support to the core network stack supplementingBjoern A. Zeeb2012-02-031-33/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | the original IPv4 implementation from r178888: - Use RT_DEFAULT_FIB in the IPv4 implementation where noticed. - Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in the NFS code. - Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack where applicable. - Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally prevent multiple initializations of callouts in in6_inithead(). - Use wrapper functions where needed to preserve the current KPI to ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup. - Fix (related) comments (both technical or style). - Convert to rtinit() where applicable and only use custom loops where currently not possible otherwise. - Multicast group, most neighbor discovery address actions and faith(4) are locked to the default FIB. Individual IPv6 addresses will only appear in the default FIB, however redirect information and prefixes of connected subnets are automatically propagated to all FIBs by default (mimicking IPv4 behavior as closely as possible). Sponsored by: Cisco Systems, Inc. Notes: svn path=/projects/multi-fibv6/head/; revision=230942
* Provide the IPv6 counterpart to the extended IPv4 rtalloc(9) KPI.Bjoern A. Zeeb2012-02-031-0/+40
| | | | | | | Sponsored by: Cisco Systems, Inc. Notes: svn path=/projects/multi-fibv6/head/; revision=230936
* After some off-list discussion, revert a number of changes to theDimitry Andric2010-11-221-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE. Notes: svn path=/head/; revision=215701
* Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughoutDimitry Andric2010-11-141-6/+6
| | | | | | | the tree. Notes: svn path=/head/; revision=215317
* Add a bandaid for a long-standing race condition during route entryXin LI2010-09-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | un-expiring. The previous version of code have no locking when testing rt_refcnt. The result of the lack of locking may result in a condition where a routing entry have a reference count but at the same time have RTPRF_OURS bit set and an expiration timer. These would eventually lead to a panic: panic: rtqkill route really not free When the system have ICMP redirects accepted from local gateway in a moderate frequency, for instance. Commit this workaround for now until we have some better solution. PR: kern/149804 Reviewed by: bz Tested by: Zhao Xin, Pete French MFC after: 2 weeks Notes: svn path=/head/; revision=213225
* MFP4: @176978-176982, 176984, 176990-176994, 177441Bjoern A. Zeeb2010-04-291-14/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days Notes: svn path=/head/; revision=207369
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andRobert Watson2009-08-011-1/+0
| | | | | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket) Notes: svn path=/head/; revision=196019
* Remove unused VNET_SET() and related macros; only VNET_GET() isRobert Watson2009-07-161-6/+6
| | | | | | | | | | | | ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib) Notes: svn path=/head/; revision=195727
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorRobert Watson2009-07-141-25/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith) Notes: svn path=/head/; revision=195699
* In r194702 I meant to remove vnet.h which is no longer needed, not route.h.Bjoern A. Zeeb2009-06-231-1/+1
| | | | Notes: svn path=/head/; revision=194714