aboutsummaryrefslogtreecommitdiff
path: root/sys/dev/mana
Commit message (Collapse)AuthorAgeFilesLines
* mana: remove redundant doorbell in mana_poll_rx_cq()Wei Hu2025-03-141-7/+0
| | | | | | | | | | | | | With the last commit to refill the rx mbuf in batch, the doorbell in mana_poll_rx_cq() becomes redundant. Remove it to save a few microseconds spent in mmio call. Reported by: NetApp Reviewed by: Tallamraju, Sai Tested by: whu Fixes: 9b8701b8 ("mana: refill the rx mbuf in batch") MFC after: 3 days Sponsored by: Microsoft
* mana: refill the rx mbuf in batchWei Hu2025-02-273-30/+114
| | | | | | | | | | | Set the default refill threshod to be one quarter of the rx queue length. User can change this value with hw.mana.rx_refill_thresh in loader.conf. It improves the rx completion handling by saving 10% to 15% of overall time with this change. Tested by: whu MFC after: 2 weeks Sponsored by: Microsoft
* mana: Increase default tx and rx ring size to 1024Wei Hu2025-02-243-16/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tcp perfomance tests show high number of retries under heave tx traffic. The numbers of queue stops and wakeups also increase. Further analysis suggests the FreeBSD network stack tends to send TSO packets with multiple sg entries, typically ranging from 10 to 16. On mana, every two sgs takes one unit of tx ring. Therefore, adding up one unit for the head, it takes 6 to 9 units of tx ring to send a typical TSO packet. Current default tx ring size is 256, which can get filled up quickly under heavy load. When tx ring is full, the send queue is stopped waiting for the ring space to be freed. This could cause the network stack to drop packets, and lead to tcp retransmissions. Increase the default tx and rx ring size to 1024 units. Also introduce two tuneables allowing users to request tx and rx ring size in loader.conf: hw.mana.rx_req_size hw.mana.tx_req_size When mana is loading, the driver checks these two values and round them up to power of 2. If these two are not set or the request values are out of the allowable range, it sets the default ring size instead. Also change the tx and rx single loop completion budget to 8. Tested by: whu MFC after: 2 weeks Sponsored by: Microsoft
* Check for errors when detaching children first, not lastJohn Baldwin2024-11-051-1/+6
| | | | | | | | | | | | These detach routines in these drivers all ended with 'return (bus_generic_detach())' meaning that if any child device failed to detach, the parent driver was left in a mostly destroyed state, but still marked attached. Instead, bus drivers should detach child drivers first and return errors before destroying driver state in the parent. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D47387
* mana: Remove stray semicolonsZhenlei Huang2024-10-241-2/+2
| | | | MFC after: 1 week
* gdma: use ispower2Doug Moore2024-09-281-1/+1
| | | | | | | | It's faster to use ispower2(n) than it is to compute roundup_pow_of_two and do a comparison. So do the former. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D46838
* mana: Stop checking for failures from ↵Zhenlei Huang2024-09-033-77/+0
| | | | | | | malloc/mallocarray/buf_ring_alloc(M_WAITOK) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D45852
* net: Remove unneeded NULL check for the allocated ifnetZhenlei Huang2024-06-281-5/+0
| | | | | | | | | | | Change 4787572d0580 made if_alloc_domain() never fail, then also do the wrappers if_alloc(), if_alloc_dev(), and if_gethandle(). No functional change intended. Reviewed by: kp, imp, glebius, stevek MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D45740
* log2: move log2 functions from linuxkpi to libkernDoug Moore2024-06-241-12/+0
| | | | | | | | | | | | | | | | | | | | | Linux has a header file that defines an ilog2 function and some simple functions/macros that use it: roundup_pow_of_two, is_power_of_2, rounddown_pow_of_two, and order_base_2. This change moves three of those simple functions (all but is_power_of_2) from linuxkpi to libkern. It also deletes a few implementations of these functions that have previously been copied into code for various device drivers, so that they can use the libkern version. The is_power_of_2 macro was not moved because powerof2 in param.h provides almost the same service already (except that they disagree about whether 0 is a power of two). Since the linux definitions of these functions were copied into FreeBSD 11 years ago, linux has improved them, and this change provides those improvements. In particular, a giant table of log values for evaluating ilog2 for constant values is no longer necessary. Reviewed by: alc, markj (previous version) Differential Revision: https://reviews.freebsd.org/D45536
* dev/mana: replace power2 functionDoug Moore2024-06-241-1/+1
| | | | | | | | | Replace is_power_of_2(length) with power2(length). When length != 0, as in this case, they produce the same result. This will allow an implementation of is_power_of_two to be dropped. Reviewed by: alc, markj Differential Revision: https://reviews.freebsd.org/D45536
* mana: Use device_set_desc()Mark Johnston2024-06-161-3/+1
| | | | | | No functional change intended. MFC after: 1 week
* libkern: add ilog2 macroDoug Moore2024-06-031-9/+0
| | | | | | | | | | | | | | | The kernel source contains several definitions of an ilog2 function; some are slower than necessary, and one of them is incorrect. Elimininate them all and define an ilog2 macro in libkern to replace them, in a way that is fast, correct for all argument types, and, in a GENERIC kernel, includes a check for an invalid zero parameter. Folks at Microsoft have verified that having a correct ilog2 definition for their MANA driver doesn't break it. Reviewed by: alc, markj, mhorne (older version), jhibbits (older version) Differential Revision: https://reviews.freebsd.org/D45170 Differential Revision: https://reviews.freebsd.org/D45235
* mana: fix leaking pci resource problem detaching mana deivcesWei Hu2024-02-131-1/+1
| | | | | | | | | Fixing the error messages when detaching the mana gdma devices showed in dmesg: "Device leaked memory resources". Reported by: NetApp MFC after: 3 days Sponsored by: Microsoft
* mana: Fix TX CQE error handlingWei Hu2024-01-173-6/+16
| | | | | | | | | | | | | For an unknown TX CQE error type (probably from a newer hardware), still free the mbuf, update the queue tail, etc., otherwise the accounting will be wrong. Also, TX errors can be triggered by injecting corrupted packets, so replace the mana_err to mana_dbg logging. Reported by: NetApp MFC after: 1 week Sponsored by: Microsoft
* sys: Automated cleanup of cdefs and other formattingWarner Losh2023-11-275-5/+5
| | | | | | | | | | | | | | | | Apply the following automated changes to try to eliminate no-longer-needed sys/cdefs.h includes as well as now-empty blank lines in a row. Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/ Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/ Remove /\n+#if.*\n#endif.*\n+/ Remove /^#if.*\n#endif.*\n/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/ Sponsored by: Netflix
* mana: add lro and tso stat countersWei Hu2023-09-143-0/+153
| | | | | | | Add a few stat counters for tso and lro. MFC after: 3 days Sponsored by: Microsoft
* mana: add ioctl to support toggling offloading featuresWei Hu2023-09-131-1/+76
| | | | | | | | | | | | | | | | | | | | | | | | With this support, users can enable or disable offloading features such as txcsum, rxcsum, tso and software lro through ifconfig. To enable or disable tx features, do it on mana interface first, then hn/netvsc to sync it up with mana. For example: ifconfig mana0 -txcsum ifconfig hn0 -tscsum To enable or disable rx features, just applying on mana interface would be sufficient. Disabling txcsum imples disabling tso. Enabling tso when txcsum is disabled will result in an error message in dmesg requesting to enable txcsum first. Above applies to ipv6 offloading features as well. Tested by: whu MFC after: 3 days Sponsored by: Microsoft
* mana: fix tso parameters and set hwassist bitsWei Hu2023-09-042-3/+14
| | | | | | | | | | The parameters for tso on mana were not set correctly. Also the hwassist bits were not set. These two cause tso on mana not work. Fixed the issues and make tso working on mana. Tested by: whu MFC after: 3 days Sponsored by: Microsoft
* mana: batch ringing RX queue doorbell on receiving packetsWei Hu2023-08-282-2/+9
| | | | | | | | | | | | | | | | | | | It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance. Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue in mana_poll_rx_cq(). In addition, use the correct WQE count for ringing RQ doorbell. The hardware specification specifies that WQE_COUNT should set to 0 for the Receive Queue. Although currently the hardware doesn't enforce the check, in the future releases it may check on this value. Tested by: whu MFC after: 1 week Sponsored by: Microsoft
* sys: Remove $FreeBSD$: one-line .c patternWarner Losh2023-08-166-13/+0
| | | | Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
* sys: Remove $FreeBSD$: two-line .h patternWarner Losh2023-08-166-12/+0
| | | | Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
* mana: fix a KASSERT panic on recursed lock access in mana_cfg_vportWei Hu2023-08-111-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | The panic stack looks like this: panic: _sx_xlock_hard: recursed on non-recursive sx MANA port lock @ /usr/src/sys/dev/mana/mana_en.c:1022 KDB: stack backtrace: vpanic() at vpanic+0x150/frame 0xfffffe011b3c1970 panic() at panic+0x43/frame 0xfffffe011b3c19d0 _sx_xlock_hard() at _sx_xlock_hard+0x82d/frame 0xfffffe011b3c1a70 _sx_xlock() at _sx_xlock+0xb0/frame 0xfffffe011b3c1ab0 mana_cfg_vport() at mana_cfg_vport+0x79/frame 0xfffffe011b3c1b40 mana_alloc_queues() at mana_alloc_queues+0x3b/frame 0xfffffe011b3c1c50 mana_up() at mana_up+0x40/frame 0xfffffe011b3c1c70 mana_ioctl() at mana_ioctl+0x25b/frame 0xfffffe011b3c1cb0 ifhwioctl() at ifhwioctl+0xd11/frame 0xfffffe011b3c1db0 hn_xpnt_vf_init() at hn_xpnt_vf_init+0x15f/frame 0xfffffe011b3c1e10 The lock has already been held in the caller. Remove this redundant lock attempt. Reported by: NetApp Sponsored by: Microsoft
* Mechanically convert mana to IfAPIJustin Hibbits2023-02-072-51/+51
| | | | | | Reviewed by: zlei Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D37835
* mana(4): Make the code cross-platformLi-Wen Hsu2022-11-041-12/+3
| | | | | | Discussed with: whu Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D36388
* Fix various places which cast a pointer to a uint64_t or vice versa.John Baldwin2022-09-282-6/+6
| | | | | | | GCC warns about the mismatched sizes on 32-bit platforms. Reviewed by: imp, markj Differential Revision: https://reviews.freebsd.org/D36752
* mana: Fix a couple i386 build errorsWei Hu2022-08-291-0/+6
| | | | | | | Fix a couple i386 build errors Fixes: b685df314f138 Sponsored by: Microsoft
* mana: some code refactoring and export apis for future RDMA driverWei Hu2022-08-295-31/+311
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Record the physical address for doorbell page region For supporting RDMA device with multiple user contexts with their individual doorbell pages, record the start address of doorbell page region for use by the RDMA driver to allocate user context doorbell IDs. - Handle vport sharing between devices For outgoing packets, the PF requires the VF to configure the vport with corresponding protection domain and doorbell ID for the kernel or user context. The vport can't be shared between different contexts. Implement the logic to exclusively take over the vport by either the Ethernet device or RDMA device. - Add functions for allocating doorbell page from GDMA The RDMA device needs to allocate doorbell pages for each user context. Implement those functions and expose them for use by the RDMA driver. - Export Work Queue functions for use by RDMA driver RDMA device may need to create Ethernet device queues for use by Queue Pair type RAW. This allows a user-mode context accesses Ethernet hardware queues. Export the supporting functions for use by the RDMA driver. - Define max values for SGL entries The number of maximum SGl entries should be computed from the maximum WQE size for the intended queue type and the corresponding OOB data size. This guarantees the hardware queue can successfully queue requests up to the queue depth exposed to the upper layer. - Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES When doing memory registration, the PF may respond with GDMA_STATUS_MORE_ENTRIES to indicate a follow request is needed. This is not an error and should be processed as expected. - Define data structures for protection domain and memory registration The MANA hardware support protection domain and memory registration for use in RDMA environment. Add those definitions and expose them for use by the RDMA driver. MFC after: 2 weeks Sponsored by: Microsoft
* mana: add rmb load fence to comply with hw specWei Hu2022-08-151-0/+4
| | | | | | To ensure software reads fresh data after observing ownership bits. Sponsored by: Microsoft
* mana: Remove unused devclass argument to DRIVER_MODULE.John Baldwin2022-05-091-2/+1
|
* mana: Add handling of CQE_RX_TRUNCATEDWei Hu2022-02-151-1/+5
| | | | | | | | The proper way to drop this kind of CQE is advancing rxq tail without indicating the packet to the upper network layer. MFC after: 2 weeks Sponsored by: Microsoft
* mana: Add RX fencingWei Hu2022-01-142-5/+68
| | | | | | | | | | | | | RX fencing allows the driver to know that any prior change to the RQs has finished, e.g. when the RQs are disabled/enabled or the hashkey/indirection table are changed, RX fencing is required. Remove the previous 'sleep' workaround and add the real support for RX fencing as the PF driver supports the MANA_FENCE_RQ request now (any old PF driver not supporting the request won't be used in production). MFC after: 2 weeks Sponsored by: Microsoft
* mana: fix misc minor handlding issues when error happens.Wei Hu2022-01-133-9/+17
| | | | | | | | | | | | | | | | | - In mana_create_txq(), if test fails we must free some resources as in all the other handling paths of this function. - In mana_gd_read_cqe(), add warning log in case of CQE read overflow, instead of failing silently. - Fix error handling in mana_create_rxq() when cq->gdma_id >= gc->max_num_cqs. - In mana_init_port(), use the correct port index rather than 0. - In mana_hwc_create_wq(), If allocating the DMA buffer fails, mana_hwc_destroy_wq was called without previously storing the pointer to the queue. In order to avoid leaking the pointer to the queue, store it as soon as it is allocated. MFC after: 2 weeks Sponsored by: Microsoft
* mana: Improve the HWC error handlingWei Hu2022-01-132-45/+33
| | | | | | | | | | | | Currently when the HWC creation fails, the error handling is flawed, e.g. if mana_hwc_create_channel() -> mana_hwc_establish_channel() fails, the resources acquired in mana_hwc_init_queues() is not released. Enhance mana_hwc_destroy_channel() to do the proper cleanup work and call it accordingly. MFC after: 2 weeks Sponsored by: Microsoft
* Mana: report OS info to PF driverWei Hu2022-01-101-0/+11
| | | | | | | The PF driver might use the OS info for statistical purposes. MFC after: 2 weeks Sponsored by: Microsoft
* mana: Fix a typo in a source code commentGordon Bergling2021-11-031-1/+1
| | | | | | - s/maxium/maximum/ MFC after: 1 week
* Mana: move mana polling from EQ to CQWei Hu2021-10-265-214/+193
| | | | | | | | | | | | | | | | -Each CQ start task queue to poll when completion happens. This means every rx and tx queue has its own cleanup task thread to poll the completion. - Arm EQ everytime no matter it is mana or hwc. CQ arming depends on the budget. - Fix a warning in mana_poll_tx_cq() when cqe_read is 0. - Move cqe_poll from EQ to CQ struct. - Support EQ sharing up to 8 vPorts. - Ease linkdown message from mana_info to mana_dbg. Tested by: whu MFC after: 2 weeks Sponsored by: Microsoft
* mana: Cast an unused value to void to quiet a warning.John Baldwin2021-09-251-1/+1
| | | | | | | This appeases a -Wunused-value warning from GCC 9. Reviewed by: whu Differential Revision: https://reviews.freebsd.org/D31948
* Remove unused function mana_reset_counters.Wei Hu2021-08-201-9/+0
| | | | | | | | This fixes the build warning caused by this function. Reported by: markj Tested by: whu MFC after: 2 weeks Sponsored by: Microsoft
* Microsoft Azure Network Adapter(MANA) VF supportWei Hu2021-08-2012-0/+8223
MANA is the new network adapter from Microsoft which will be available in Azure public cloud. It provides SRIOV NIC as virtual function to guest OS running on Hyper-V. The code can be divided into two major parts. Gdma_main.c is the one to bring up the hardware board and drives all underlying hardware queue infrastructure. Mana_en.c contains all main ethernet driver code. It has only tested and supported on amd64 architecture. PR: 256336 Reviewed by: decui@microsoft.com Tested by: whu MFC after: 2 week Relnotes: yes Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D31150