aboutsummaryrefslogtreecommitdiff
path: root/sys/dev/netmap/netmap.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r281406:Rui Paulo2015-04-181-0/+6
| | | | | | | netmap: improve the netmap attach message on FreeBSD. Notes: svn path=/stable/10/; revision=281706
* sync the code with the version in head. which the exception ofLuigi Rizzo2015-02-141-14/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | svn 275358 (M_FLOWID deprecation, only a couple of lines) which cannot be merged. if_lem_netmap.h, if_re_netmap.h: - use the same (commented out) function to update the stat counters as in HEAD. This is a no-op here netmap.c - merge 274459 (support for private knote lock) and minor changes on nm_config and comments netmap_freebsd.c - merge 274459 (support for private knote lock) - merge 274354 (initialize color if passed as argument) netmap_generic.c - fix a comment netmap_kern.h - revise the lock macros, using sx locks; merge 274459 (private knote lock) netmap_monitor.c - use full memory barriers netmap_pipe.c - use full memory barriers, use length from the correct queue (mostly cosmetic, since the queues typically have the same size) Notes: svn path=/stable/10/; revision=278779
* MFC r272111Luigi Rizzo2014-10-061-15/+10
| | | | | | | | fix a panic when passing ifioctl from a netmap file descriptor to the underlying device. This needs to be merged to 10.1 Notes: svn path=/stable/10/; revision=272604
* MFC 270063: update of netmap codeLuigi Rizzo2014-08-201-146/+537
| | | | | | | (vtnet and cxgbe not merged yet because we need some other mfc first) Notes: svn path=/stable/10/; revision=270252
* MFC 267284Luigi Rizzo2014-06-101-19/+7
| | | | | | | | | | | | | | | | | | | | | | Fixes from Fanco Ficthner on transparent mode * The way rings are updated changed with the last API bump. Also sync ->head when moving slots in netmap_sw_to_nic(). * Remove a crashing selrecord() call. * Unclog the logic surrounding netmap_rxsync_from_host(). * Add timestamping to RX host ring. * Remove a couple of obsolete comments. Submitted by: Franco Fichtner MFC after: 3 days Sponsored by: Packetwerk Notes: svn path=/stable/10/; revision=267334
* sync netmap code with the version in HEAD:Luigi Rizzo2014-06-091-25/+107
| | | | | | | | | | - fix handling of tx mbufs in emulated netmap mode; - introduce mbq_lock() and mbq_unlock() - rate limit some error messages - many whitespace and comment fixes Notes: svn path=/stable/10/; revision=267282
* MFH: sync the netmap code with the one in HEADLuigi Rizzo2014-02-181-2465/+1753
| | | | | | | | (enhanced VALE switch, netmap pipes, emulated netmap mode). See details in the log for svn 261909. Notes: svn path=/stable/10/; revision=262151
* - fix a bug in the previous commit that was dropping the last packetLuigi Rizzo2013-06-051-11/+39
| | | | | | | | | | | | | | | | | | from each batch flowing on the VALE switch - feature: add glue for 'indirect' buffers on the sender side: if a slot has NS_INDIRECT set, the netmap buffer contains pointer(s) to the actual userspace buffers, which are accessed with copyin(). The feature is not finalised yet, as it will likely need to deal with some iovec variant for proper scatter/gather support. This will save one copy for clients (e.g. qemu) that cannot use the netmap buffer directly. A curiosity: on amd64 copyin() appears to be 10-15% faster than pkt_copy() or bcopy() at least for sizes of 256 and greater. Notes: svn path=/head/; revision=251425
* Bring in a number of new features, mostly implemented by Michio Honda:Luigi Rizzo2013-05-301-309/+1054
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - the VALE switch now support up to 254 destinations per switch, unicast or broadcast (multicast goes to all ports). - we can attach hw interfaces and the host stack to a VALE switch, which means we will be able to use it more or less as a native bridge (minor tweaks still necessary). A 'vale-ctl' program is supplied in tools/tools/netmap to attach/detach ports the switch, and list current configuration. - the lookup function in the VALE switch can be reassigned to something else, similar to the pf hooks. This will enable attaching the firewall, or other processing functions (e.g. in-kernel openvswitch) directly on the netmap port. The internal API used by device drivers does not change. Userspace applications should be recompiled because we bump NETMAP_API as we now use some fields in the struct nmreq that were previously ignored -- otherwise, data structures are the same. Manpages will be committed separately. Notes: svn path=/head/; revision=251139
* remove trailing whitespaceLuigi Rizzo2013-05-021-3/+3
| | | | Notes: svn path=/head/; revision=250184
* Partial cleanup in preparation for upcoming changes:Luigi Rizzo2013-04-301-30/+70
| | | | | | | | | | | | | | | | - netmap_rx_irq()/netmap_tx_irq() can now be called by FreeBSD drivers hiding the logic for handling NIC interrupts in netmap mode. This also simplifies the case of NICs attached to VALE switches. Individual drivers will be updated with separate commits. - use the same refcount() API for FreeBSD and linux - plus some comments, typos and formatting fixes Portions contributed by Michio Honda Notes: svn path=/head/; revision=250107
* whitespace changes:Luigi Rizzo2013-04-291-0/+3
| | | | | | | remove $Id$ lines, and add blank lines around some #if / #elif /#endif Notes: svn path=/head/; revision=250052
* mostly whitespace changes:Luigi Rizzo2013-04-191-7/+3
| | | | | | | | - remove vestiges of the old memory allocator - clean up some comments Notes: svn path=/head/; revision=249659
* Switch the vm_object mutex to be a rwlock. This will enable in theAttilio Rao2013-03-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho Notes: svn path=/head/; revision=248084
* Add support for transparent mode while in netmap.Luigi Rizzo2013-01-231-30/+179
| | | | | | | | | | | | | | | | | | | | | | | By setting dev.netmap.fwd=1 (or enabling the feature with a per-ring flag), packets are forwarded between the NIC and the host stack unless the netmap client clears the NS_FORWARD flag on the individual descriptors. This feature greatly simplifies applications where some traffic (think of ARP, control traffic, ssh sessions...) must be processed by the host stack, whereas the bulk is handled by the netmap process which simply (un)marks packets that should not be forwarded. The default is chosen so that now a netmap receiver operates in a mode very similar to bpf. Of course there is no free lunch: traffic to/from the host stack still operates at OS speed (or less, as there is one extra copy in one direction). HOWEVER, since traffic goes to the user process before being reinjected, and reinjection occurs in a user context, you get some form of livelock protection for free. Notes: svn path=/head/; revision=245836
* control some debugging messages with dev.netmap.verboseLuigi Rizzo2013-01-231-43/+86
| | | | | | | | add infrastracture to adapt to changes in number of queues and buffers at runtime Notes: svn path=/head/; revision=245835
* Fix build.Gleb Smirnoff2012-10-191-2/+2
| | | | Notes: svn path=/head/; revision=241723
* This is an import of code, mostly from Giuseppe Lettieri,Luigi Rizzo2012-10-191-89/+298
| | | | | | | | | | | | | | | | | | | | | | | | that revises the netmap memory allocator so that the various parameters (number and size of buffers, rings, descriptors) can be modified at runtime through sysctl variables. The changes become effective when no netmap clients are active. The API is mostly unchanged, although the NIOCUNREGIF ioctl now does not bring the interface back to normal mode: and you need to close the file descriptor for that. This change was necessary to track who is using the mapped region, and since it is a simplification of the API there was no incentive in trying to preserve NIOCUNREGIF. We will remove the ioctl from the kernel next time we need a real API change (and version bump). Among other things, buffer allocation when opening devices is now much faster: it used to take O(N^2) time, now it is linear. Submitted by: Giuseppe Lettieri Notes: svn path=/head/; revision=241719
* Improve lock and unlock symmetryEd Maste2012-08-091-15/+14
| | | | | | | | | | | | | | | | | | - Move destruction of per-ring locks to netmap_dtor_locked to mirror the initialization that happens in NIOCREGIF. Otherwise unloading a netmap- capable interface that was never put into netmap mode would try to mtx_destroy an uninitialized mutex, and panic. - Destroy core_lock in netmap_detach, mirroring init in netmap_attach. - Also comment out the knlist_destroy for now as there is currently no knlist_init. Sponsored by: ADARA Networks Reviewed by: luigi@ Notes: svn path=/head/; revision=239149
* Fix whitespace (missing newline)Ed Maste2012-08-081-1/+2
| | | | Notes: svn path=/head/; revision=239141
* Clarify comments about number of tx / rx ringsEd Maste2012-08-081-1/+2
| | | | Notes: svn path=/head/; revision=239140
* fix some signed/unsigned warnings in the netmap code.Luigi Rizzo2012-08-021-4/+4
| | | | | | | | Unfortunately the original drivers still have a lot of sign conversion/comparison warnings. Notes: svn path=/head/; revision=238985
* Add a newline on an error message;Luigi Rizzo2012-08-021-7/+13
| | | | | | | | rename linux functions to avoid confusion; fix error reporting on linux Notes: svn path=/head/; revision=238982
* - move the inclusion of netmap headers to the common part of the code;Luigi Rizzo2012-07-301-7/+12
| | | | | | | - more portable annotations for unused arguments; Notes: svn path=/head/; revision=238912
* use __builtin_prefetch() for prefetch.Luigi Rizzo2012-07-271-26/+160
| | | | | | | | merge in the remaining part of the linux-specific glue so i do not need to maintain two different distributions. Notes: svn path=/head/; revision=238837
* define prefetch as a noop on !x86Luigi Rizzo2012-07-261-0/+4
| | | | Notes: svn path=/head/; revision=238818
* Add support for VALE bridges to the netmap core, seeLuigi Rizzo2012-07-261-18/+695
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | http://info.iet.unipi.it/~luigi/vale/ VALE lets you dynamically instantiate multiple software bridges that talk the netmap API (and are *extremely* fast), so you can test netmap applications without the need for high end hardware. This is particularly useful as I am completing a netmap-aware version of ipfw, and VALE provides an excellent testing platform. Also, I also have netmap backends for qemu mostly ready for commit to the port, and this too will let you interconnect virtual machines at high speed without fiddling with bridges, tap or other slow solutions. The API for applications is unchanged, so you can use the code in tools/tools/netmap (which i will update soon) on the VALE ports. This commit also syncs the code with the one in my internal repository, so you will see some conditional code for other platforms. The code should run mostly unmodified on stable/9 so people interested in trying it can just copy sys/dev/netmap/ and sys/net/netmap*.h from HEAD VALE is joint work with my colleague Giuseppe Lettieri, and is partly supported by the EU Projects CHANGE and OPENLAB Notes: svn path=/head/; revision=238812
* print 'netmap stack ring full' only in verbose mode.Luigi Rizzo2012-05-031-1/+2
| | | | Notes: svn path=/head/; revision=234986
* A bit of cleanup in the names of fields of netmap-related structures.Luigi Rizzo2012-04-131-32/+31
| | | | | | | | Use the name 'ring' instead of 'queue' in all fields. Bump NETMAP_API. Notes: svn path=/head/; revision=234227
* Some code restructuring to bring the memory allocator out of netmap.cLuigi Rizzo2012-04-121-524/+31
| | | | | | | | | | | | | | and make it easier to replace it with a different implementation. On passing, also fix indentation. NOTE: I know that #include "foo.c" is ugly, but the alternative (add another entry to sys/conf/files, add a separate header with structs and prototypes, and expose functions that are meant to be private) looks even worse to me. We need a more modular way to specify dependencies and build options. Notes: svn path=/head/; revision=234174
* use correct selinfo pointer for the generic interrupt handlerLuigi Rizzo2012-04-121-2/+2
| | | | | | | (it is never used in current FreeBSD drivers). Notes: svn path=/head/; revision=234169
* A couple of changes related to ixgbe operation in netmap mode:Luigi Rizzo2012-04-111-1/+1
| | | | | | | | | | | | | | | | | | | | | - add a sysctl, dev.netmap.ix_crcstrip, to control whether ixgbe should strip the CRC on received frames. Defaults to 0, which keeps the CRC. and improves performance when receiving min-sized (64-byte) frames. This matters because min-sized frames is one of the standard benchmarks for switches and routers, some chipsets seem to issue read-modify-write cycles for PCIe transactions that are not a full cache line, and a min-sized frame triggers the bug, resulting in reduced throughput -- 9.7 instead of 14.88 Mpps -- and heavy bus load. - for the time being, always look for incoming packets on a select/poll even if there has not been an interrupt in the meantime. This is only a temporary workaround for a probable race condition in keeping track of rx interrupts. Add a couple of diagnostic vars to help studying the problem. Notes: svn path=/head/; revision=234140
* A bunch of netmap fixes:Luigi Rizzo2012-02-271-169/+228
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | USERSPACE: 1. add support for devices with different number of rx and tx queues; 2. add better support for zero-copy operation, adding an extra field to the netmap ring to indicate how many buffers we have already processed but not yet released (with help from Eddie Kohler); 3. The two changes above unfortunately require an API change, so while at it add a version field and some spares to the ioctl() argument to help detect mismatches. 4. update the manual page for the two changes above; 5. update sample applications in tools/tools/netmap KERNEL: 1. simplify the internal structures moving the global wait queues to the 'struct netmap_adapter'; 2. simplify the functions that map kring<->nic ring indexes 3. normalize device-specific code, helps mainteinance; 4. start exploring the impact of micro-optimizations (prefetch etc.) in the ixgbe driver. Use 'legacy' descriptors on the tx ring and prefetch slots gives about 20% speedup at 900 MHz. Another 7-10% would come from removing the explict calls to bus_dmamap* in the core (they are effectively NOPs in this case, but it takes expensive load of the per-buffer dma maps to figure out that they are all NULL. Rx performance not investigated. I am postponing the MFC so i can import a few more improvements before merging. Notes: svn path=/head/; revision=232238
* Various cleanups for readability (no functional changes)Luigi Rizzo2012-02-171-117/+28
| | | | | | | | | | | | - remove the KEVENT code, which was incomplete and not compiled anyways; - change some while() loops into for() - adjust indentation - remove extra whitespace MFC after: 1 week Notes: svn path=/head/; revision=231881
* - use struct ifnet as explicit type of the argument to theLuigi Rizzo2012-02-131-38/+125
| | | | | | | | | | | | | txsync() and rxsync() callbacks, removing some variables made useless by this change; - add generic lock and irq handling routines. These can be useful in case there are no driver locks that we can reuse; - add a few macros to reduce differences with the Linux version. Notes: svn path=/head/; revision=231594
* - change the buffer size from a constant to aLuigi Rizzo2012-02-081-409/+431
| | | | | | | | | | | | | | | | TUNABLE variable (hw.netmap.buf_size) so we can experiment with values different from 2048 which may give better cache performance. - rearrange the memory allocation code so it will be easier to replace it with a different implementation. The current code relies on a single large contiguous chunk of memory obtained through contigmalloc. The new implementation (not committed yet) uses multiple smaller chunks which are easier to fit in a fragmented address space. Notes: svn path=/head/; revision=231198
* ixgbe changes:Luigi Rizzo2012-01-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - remove experimental code for disabling CRC - use the correct constant for conversion between interrupt rate and EITR values (the previous values were off by a factor of 2) - make dev.ix.N.queueM.interrupt_rate a RW sysctl variable. Changing individual values affects the queue immediately, and propagates to all interfaces at the next reinit. - add dev.ix.N.queueM.irqs rdonly sysctl, to export the actual interrupt counts Netmap-related changes for ixgbe: - use the "new" format for TX descriptors in netmap mode. - pass interrupt mitigation delays to the user process doing poll() on a netmap file descriptor. On the RX side this means we will not check the ring more than once per interrupt. This gives the process a chance to sleep and process packets in larger batches, thus reducing CPU usage. On the TX side we take this even further: completed transmissions are reclaimed every half ring even if the NIC interrupts more often. This saves even more CPU without any additional tx delays. Generic Netmap-related changes: - align the netmap_kring to cache lines so that there is no false sharing (possibly useful for multiqueue NICs and MSIX interrupts, which are handled by different cores). It's a minor improvement but it does not cost anything. Reviewed by: Jack Vogel Approved by: Jack Vogel Notes: svn path=/head/; revision=230572
* indentation and whitespace fixesLuigi Rizzo2012-01-131-2/+2
| | | | Notes: svn path=/head/; revision=230058
* Two performance-related fixes:Luigi Rizzo2012-01-131-26/+3
| | | | | | | | | | | | | | | | | | | | | | | 1. as reported by Alexander Fiveg, the allocator was reporting half of the allocated memory. Fix this by exiting from the loop earlier (not too critical because this code is going away soon). 2. following a discussion on freebsd-current http://lists.freebsd.org/pipermail/freebsd-current/2012-January/031144.html turns out that (re)loading the dmamap was expensive and not optimized. This operation is in the critical path when doing zero-copy forwarding between interfaces. At least on netmap and i386/amd64, the bus_dmamap_load can be completely bypassed if the map is NULL, so we do it. The latter change gives an almost 3x improvement in forwarding performance, from the previous 9.5Mpps at 2.9GHz to the current line rate (14.2Mpps) at 1.733GHz. (this is for 64+4 byte packets, in other configurations the PCIe bus is a bottleneck). Notes: svn path=/head/; revision=230052
* other simplifications in the internal interfaces to theLuigi Rizzo2012-01-101-11/+13
| | | | | | | memory allocator. Notes: svn path=/head/; revision=229947
* small code cleanup in preparation for future modifications inLuigi Rizzo2012-01-101-19/+29
| | | | | | | | | | the memory allocator used by netmap. No functional change, two small bug fixes: - in if_re.c add a missing bus_dmamap_sync() - in netmap.c comment out a spurious free() in an error handling block Notes: svn path=/head/; revision=229939
* 1. don't use if_pspare directly, but through a macro WMA()Luigi Rizzo2011-12-231-5/+4
| | | | | | | 2. move a variable declaration at the beginning of a block Notes: svn path=/head/; revision=228845
* revise the implementation of the rings connected to the host stackLuigi Rizzo2011-12-051-27/+46
| | | | Notes: svn path=/head/; revision=228280
* 1. Fix the handling of link reset while in netmap more.Luigi Rizzo2011-12-051-143/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A link reset now is completely transparent for the netmap client: even if the NIC resets its own ring (e.g. restarting from 0), the client will not see any change in the current rx/tx positions, because the driver will keep track of the offset between the two. 2. make the device-specific code more uniform across different drivers There were some inconsistencies in the implementation of the netmap support routines, now drivers have been aligned to a common code structure. 3. import netmap support for ixgbe . This is implemented as a very small patch for ixgbe.c (233 lines, 11 chunks, mostly comments: in total the patch has only 54 lines of new code) , as most of the code is in an external file sys/dev/netmap/ixgbe_netmap.h , following some initial comments from Jack Vogel about making changes less intrusive. (Note, i have emailed Jack multiple times asking if he had comments on this structure of the code; i got no reply so i assume he is fine with it). Support for other drivers (em, lem, re, igb) will come later. "ixgbe" is now the reference driver for netmap support. Both the external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should serve as a reference for other device drivers. Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap, the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz on an i7-860 with 4 cores and 82599 card. Haven't tried yet more aggressive optimizations such as adding 'prefetch' instructions in the time-critical parts of the code. Notes: svn path=/head/; revision=228276
* fix formatting warning using casts. The numbers involvedLuigi Rizzo2011-11-231-4/+4
| | | | | | | | are small and these are debug statements, so there is no reason to obfuscate the format string with PRIsomeKINDofINTEGER Notes: svn path=/head/; revision=227875
* Bring in support for netmap, a framework for very efficient packetLuigi Rizzo2011-11-171-0/+1762
I/O from userspace, capable of line rate at 10G, see http://info.iet.unipi.it/~luigi/netmap/ At this time I am bringing in only the generic code (sys/dev/netmap/ plus two headers under sys/net/), and some sample applications in tools/tools/netmap. There is also a manpage in share/man/man4 [1] In order to make use of the framework you need to build a kernel with "device netmap", and patch individual drivers with the code that you can find in sys/dev/netmap/head.diff The file will go away as the relevant pieces are committed to the various device drivers, which should happen in a few days after talking to the driver maintainers. Netmap support is available at the moment for Intel 10G and 1G cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re"). I have partial patches for "bge" and am starting to work on "cxgbe". Hopefully changes are trivial enough so interested third parties can submit their patches. Interested people can contact me for advice on how to add netmap support to specific devices. CREDITS: Netmap has been developed by Luigi Rizzo and other collaborators at the Universita` di Pisa, and supported by EU project CHANGE (http://www.change-project.eu/) The code is distributed under a BSD Copyright. [1] In my opinion is a bad idea to have all manpage in one directory. We should place kernel documentation in the same dir that contains the code, which would make it much simpler to keep doc and code in sync, reduce the clutter in share/man/ and incidentally is the policy used for all of userspace code. Makefiles and doc tools can be trivially adjusted to find the manpages in the relevant subdirs. Notes: svn path=/head/; revision=227614