summaryrefslogtreecommitdiff
path: root/sys/netinet/udp_usrreq.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge r204826 from head to stable/8:Robert Watson2010-06-031-12/+5
| | | | | | | | | | | | | | Make udp_set_kernel_tunneling() less forgiving when its invariants are violated: so_pcb can never be NULL for a valid UDP socket, and it is always SOCK_DGRAM. Use sotoinpcb() as the rest of the UDP code does. Reviewed by: bz Sponsored by: Juniper Networks Approved by: re (kib) Notes: svn path=/stable/8/; revision=208767
* MFC r207369:Bjoern A. Zeeb2010-05-061-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Notes: svn path=/stable/8/; revision=207695
* MFC r205251:Bjoern A. Zeeb2010-03-271-3/+15
| | | | | | | | | | | Add pcb reference counting to the pcblist sysctl handler functions to ensure type stability while caching the pcb pointers for the copyout. Reviewed by: rwatson Notes: svn path=/stable/8/; revision=205762
* MFC r204807:Bjoern A. Zeeb2010-03-271-0/+3
| | | | | | | | | Destroy UDP UMA zones (empty or not) upon network stack teardown to not leak them making UMA/vmstat -z unhappy with every stoped vnet. We will still leak pages (especially as zones are marked NOFREE). Notes: svn path=/stable/8/; revision=205759
* Many network stack subsystems use a single global data structure to holdRobert Watson2009-08-021-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | all pertinent statatistics for the subsystem. These structures are sometimes "borrowed" by kernel modules that require a place to store statistics for similar events. Add KPI accessor functions for statistics structures referenced by kernel modules so that they no longer encode certain specifics of how the data structures are named and stored. This change is intended to make it easier to move to per-CPU network stats following 8.0-RELEASE. The following modules are affected by this change: if_bridge if_cxgb if_gif ip_mroute ipdivert pf In practice, most of these statistics consumers should, in fact, maintain their own statistics data structures rather than borrowing structures from the base network stack. However, that change is too agressive for this point in the release cycle. Reviewed by: bz Approved by: re (kib) Notes: svn path=/head/; revision=196039
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andRobert Watson2009-08-011-1/+0
| | | | | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket) Notes: svn path=/head/; revision=196019
* Remove unused VNET_SET() and related macros; only VNET_GET() isRobert Watson2009-07-161-1/+1
| | | | | | | | | | | | ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib) Notes: svn path=/head/; revision=195727
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorRobert Watson2009-07-141-34/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith) Notes: svn path=/head/; revision=195699
* Added support for NAT-Traversal (RFC 3948) in IPsec stack.VANHULLEBUS Yvan2009-06-121-0/+258
| | | | | | | | | | | | | | | | | Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele (julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense team, and all people who used / tried the NAT-T patch for years and reported bugs, patches, etc... X-MFC: never Reviewed by: bz Approved by: gnn(mentor) Obtained from: NETASQ Notes: svn path=/head/; revision=194062
* Introduce an infrastructure for dismantling vnet instances.Marko Zec2009-06-081-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor) Notes: svn path=/head/; revision=193731
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICRobert Watson2009-06-051-1/+0
| | | | | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd Notes: svn path=/head/; revision=193511
* Add hierarchical jails. A jail may further virtualize its environmentJamie Gritton2009-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor) Notes: svn path=/head/; revision=192895
* Implement UDP control block support.Bjoern A. Zeeb2009-05-231-37/+65
| | | | | | | | | | | | | | | | So far the udp_tun_func_t had been (ab)using inp_ppcb for udp in kernel tunneling callbacks. Move that into the udpcb and add a field for flags there to be used by upcoming changes instead of sticking udp only flags into in_pcb flags2. Bump __FreeBSD_version for ports to detect it and because of vnet* struct size changes. Submitted by: jhb (7.x version) Reviewed by: rwatson Notes: svn path=/head/; revision=192649
* Permit buiding kernels with options VIMAGE, restricted to only a singleMarko Zec2009-04-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_* macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_* family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor) Notes: svn path=/head/; revision=191688
* In preparation for turning on options VIMAGE in next commits,Marko Zec2009-04-261-0/+1
| | | | | | | | | | | rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace. Reviewed by: bz (an older version of the patch) Approved by: julian (mentor) Notes: svn path=/head/; revision=191548
* Update stats in struct udpstat using two new macros, UDPSTAT_ADD()Robert Watson2009-04-121-11/+11
| | | | | | | | | | | | and UDPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days Notes: svn path=/head/; revision=190962
* Update stats in struct ipstat using four new macros, IPSTAT_ADD(),Robert Watson2009-04-111-1/+1
| | | | | | | | | | | | IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days Notes: svn path=/head/; revision=190951
* Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSDBruce M Simpson2009-03-091-53/+16
| | | | | | | | | | | | | | IPv4 stack. Diffs are minimized against p4. PCS has been used for some protocol verification, more widespread testing of recorded sources in Group-and-Source queries is needed. sizeof(struct igmpstat) has changed. __FreeBSD_version is bumped to 800070. Notes: svn path=/head/; revision=189592
* Standardize the various prison_foo_ip[46] functions and prison_if toJamie Gritton2009-02-051-5/+5
| | | | | | | | | | | | | | | | | | return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL. Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls. Approved by: bz (mentor) Notes: svn path=/head/; revision=188144
* Addresses Roberts comments on comments. Also addsRandall Stewart2009-01-061-11/+9
| | | | | | | | | | the KASSERT and checks suggested. Reviewed by: The udp tunneling was discussed on net@ under the thread entitled "Heads up -- Thinking about UDP and tunneling" Notes: svn path=/head/; revision=186821
* Add the ability of an alternate transport protocolRandall Stewart2009-01-061-8/+85
| | | | | | | | | to easily tunnel over udp by providing a hook function that will be called instead of appending to the socket buffer. Notes: svn path=/head/; revision=186813
* Conditionally compile out V_ globals while instantiating the appropriateMarko Zec2008-12-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | container structures, depending on VIMAGE_GLOBALS compile time option. Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0. Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively #ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs. Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c. Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS. De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import. Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation Notes: svn path=/head/; revision=185895
* Rather than using hidden includes (with cicular dependencies),Bjoern A. Zeeb2008-12-021-0/+1
| | | | | | | | | | | | | | directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=185571
* MFp4:Bjoern A. Zeeb2008-11-291-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible Notes: svn path=/head/; revision=185435
* Merge more of currently non-functional (i.e. resolving toMarko Zec2008-11-261-1/+2
| | | | | | | | | | | | | | | | | | | | whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation Notes: svn path=/head/; revision=185348
* Change the initialization methodology for global variables scheduledMarko Zec2008-11-191-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation Notes: svn path=/head/; revision=185088
* Add cr_canseeinpcb() doing checks using the cached socketBjoern A. Zeeb2008-10-171-3/+2
| | | | | | | | | | | | | credentials from inp_cred which is also available after the socket is gone. Switch cr_canseesocket consumers to cr_canseeinpcb. This removes an extra acquisition of the socket lock. Reviewed by: rwatson MFC after: 3 months (set timer; decide then) Notes: svn path=/head/; revision=183982
* Cache so_cred as inp_cred in the inpcb.Bjoern A. Zeeb2008-10-041-1/+1
| | | | | | | | | | | | | | | This means that inp_cred is always there, even after the socket has gone away. It also means that it is constant for the lifetime of the inp. Both facts lead to simpler code and possibly less locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks X-MFC Note: use a inp_pspare for inp_cred Notes: svn path=/head/; revision=183606
* Step 1.5 of importing the network stack virtualization infrastructureMarko Zec2008-10-021-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*). (*) netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation Notes: svn path=/head/; revision=183550
* Another missed V_ instanceJulian Elischer2008-08-251-1/+1
| | | | Notes: svn path=/head/; revision=182148
* Commit step 1 of the vimage project, (network stack)Bjoern A. Zeeb2008-08-171-80/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch Notes: svn path=/head/; revision=181803
* Fix a regression introduced in r179289 splitting up ip6_savecontrol()Bjoern A. Zeeb2008-08-161-1/+1
| | | | | | | | | | | | | | | into v4-only vs. v6-only inp_flags processing. When ip6_savecontrol_v4() is called from ip6_savecontrol() we were not passing back the **mp thus the information will be missing in userland. Istead of going with a *** as suggested in the PR we are returning **mp now and passing in the v4only flag as a pointer argument. PR: kern/126349 Reviewed by: rwatson, dwmalone Notes: svn path=/head/; revision=181782
* Increase UDBHASHSIZE from 16 to 128 items.Alexander Motin2008-07-261-1/+1
| | | | | | | | Previous value was chosen 10 years ago and not very effective now. This change gives several percents speedup on 1000 L2TP mpd links. Notes: svn path=/head/; revision=180836
* Document a few sysctls.Tom Rhodes2008-07-201-1/+1
| | | | | | | Reviewed by: rwatson Notes: svn path=/head/; revision=180631
* Fix error in comment.Robert Watson2008-07-161-3/+3
| | | | | | | MFC after: 3 weeks Notes: svn path=/head/; revision=180558
* Merge last of a series of rwlock conversion changes to UDP, whichRobert Watson2008-07-151-34/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | completes the move to a fully parallel UDP transmit path by using global read, rather than write, locking of inpcbinfo in further semi-connected cases: - Add macros to allow try-locking of inpcb and inpcbinfo. - Always acquire an incpcb read lock in udp_output(), which stablizes the local inpcb address and port bindings in order to determine what further locking is required: - If the inpcb is currently not bound (at all) and are implicitly connecting, we require inpcbinfo and inpcb write locks, so drop the read lock and re-acquire. - If the inpcb is bound for at least one of the port or address, but an explicit source or destination is requested, trylock the inpcbinfo lock, and if that fails, drop the inpcb lock, lock the global lock, and relock the inpcb lock. - Otherwise, no further locking is required (common case). - Update comments. In practice, this means that the vast majority of consumers of UDP sockets will not acquire any exclusive locks at the socket or UDP levels of the network stack. This leads to a marked performance improvement in several important workloads, including BIND, nsd, and memcached over UDP, as well as significant improvements in pps microbenchmarks. The plan is to MFC all of the rwlock changes to RELENG_7 once they have settled for a weeks in the tree. Tested by: ps, kris (older revision), bde MFC after: 3 weeks Notes: svn path=/head/; revision=180536
* Slightly rearrange validation of UDP arguments and jail processing inRobert Watson2008-07-101-4/+25
| | | | | | | | | | | | udp_output() so that argument validation occurs before jail processing. Add additional comments explaining what's going on when we process addresses and binding during udp_output(). MFC after: 3 weeks Notes: svn path=/head/; revision=180429
* Apply the MAC label to an outgoing UDP packet when other inpcb properties areRobert Watson2008-07-101-4/+4
| | | | | | | | | | processed, meaning that we avoid the cost of MAC label assignment if we're going to drop the packet due to mbuf exhaustion, etc. MFC after: 3 weeks Notes: svn path=/head/; revision=180422
* Allow udp_notify() to accept read, as well as write, locks on the passedRobert Watson2008-07-071-3/+9
| | | | | | | | | | | | inpcb. When directly invoking udp_notify() from udp_ctlinput(), acquire only a read lock; we may still see write locks in udp_notify() as the in_pcbnotifyall() routine is shared with TCP and always uses a write lock on the inpcb being notified. MFC after: 1 month Notes: svn path=/head/; revision=180348
* Add additional udbinfo and inpcb locking assertions to udp_output(); forRobert Watson2008-07-071-0/+6
| | | | | | | | | | | some code paths, global or inpcb write locks are required, but for other code paths, read locks or no locking at all are sufficient for the data structures. MFC after: 1 month Notes: svn path=/head/; revision=180346
* First step towards parallel transmit in UDP: if neither a specificRobert Watson2008-07-071-5/+12
| | | | | | | | | | | | | | | | source or a specific destination address is requested as part of a send on a UDP socket, read lock the inpcb rather than write lock it. This will allow fully parallel transmit down to the IP layer when sending simultaneously from multiple threads on a connected UDP socket. Parallel transmit for more complex cases, such as when sendto(2) is invoked with an address and there's already a local binding, will follow. MFC after: 1 month Notes: svn path=/head/; revision=180344
* Drop read lock on udbinfo earlier during delivery to the last matchingRobert Watson2008-07-071-3/+5
| | | | | | | | | | UDP socket for a datagram; the inpcb read lock is sufficient to provide inpcb stability during udp_append(). MFC after: 1 month Notes: svn path=/head/; revision=180338
* Add soreceive_dgram(9), an optimized socket receive function for use byRobert Watson2008-07-021-0/+1
| | | | | | | | | | | | | | | | | datagram-only protocols, such as UDP. This version removes use of sblock(), which is not required due to an inability to interlace data improperly with datagrams, as well as avoiding some of the larger loops and state management that don't apply on datagram sockets. This is experimental code, so hook it up only for UDPv4 for testing; if there are problems we may need to revise it or turn it off by default, but it offers *significant* performance improvements for threaded UDP applications such as BIND9, nsd, and memcached using UDP. Tested by: kris, ps Notes: svn path=/head/; revision=180198
* In udp_append() and udp_input(), make use of read locking on incpbsRobert Watson2008-06-301-8/+8
| | | | | | | | | | | | | | | | | | rather than write locking: while we need to maintain a valid reference to the inpcb and fix its state, no protocol layer state is modified during an IPv4 UDP receive -- there are only changes at the socket layer, which is separately protected by socket locking. While parallel concurrent receive on a single UDP socket is currently relatively unusual, introducing read locking in the transmit path, allowing concurrent receive and transmit, will significantly improve performance for loads such as BIND, memcached, etc. MFC after: 2 months Tested by: gnn, kris, ps Notes: svn path=/head/; revision=180127
* Employ read locks on UDP inpcbs, rather than write locks, whenRobert Watson2008-05-291-13/+18
| | | | | | | | | | | monitoring UDP connections using sysctls. In some cases, add previously missing locking of inpcbs, as inp_socket is followed, which also allows us to drop global locks more quickly. MFC after: 1 week Notes: svn path=/head/; revision=179412
* Factor out the v4-only vs. the v6-only inp_flags processing inBjoern A. Zeeb2008-05-241-8/+3
| | | | | | | | | | | | ip6_savecontrol in preparation for udp_append() to no longer need an WLOCK as we will no longer be modifying socket options. Requested by: rwatson Reviewed by: gnn MFC after: 10 days Notes: svn path=/head/; revision=179289
* Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros toRobert Watson2008-04-171-34/+36
| | | | | | | | | | | | | | | | | | explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch) Notes: svn path=/head/; revision=178285
* Merge first in a series of TrustedBSD MAC Framework KPI changesRobert Watson2007-10-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer Notes: svn path=/head/; revision=172930
* Add FBSDID to all files in netinet so that people can moreMike Silbersack2007-10-071-1/+3
| | | | | | | | | easily include file version information in bug reports. Approved by: re (kensmith) Notes: svn path=/head/; revision=172467
* Further UDPv4 cleanup:Robert Watson2007-09-101-17/+17
| | | | | | | | | | | | | | | - Resort includes a bit. - Correct typos and wording problems in comments. - Rename udpcksum to udp_cksum to be consistent with other UDP-related configuration variables. - Remove indirection of udp_notify through local notify variable in udp_ctlinput(), which is presumably due to copying and pasting from TCP, where multiple notify routines exist. Approved by: re (kensmith) Notes: svn path=/head/; revision=172114