src-test - FreeBSD source tree

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Always enable TXOK interrupts when setting up TX queues for EDMA NICs.	Adrian Chadd	2013-04-11	1	-1/+7
\| \| \| \|	Notes: svn path=/head/; revision=249386
*	Some TX dmamap cleanups.	Adrian Chadd	2013-04-02	1	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Don't use BUS_DMA_ALLOCNOW for descriptor DMA maps; we never use bounce buffers for the descriptors themselves. * Add some XXX's to mark where the ath_buf has its mbuf ripped from underneath it without actually cleaning up the dmamap. I haven't audited those particular code paths to see if the DMA map is guaranteed to be setup there; I'll do that later. * Print out a warning if the descdma tidyup code is given some descriptors w/ maps to free. Ideally the owner will free the mbufs and unmap the descriptors before freeing the descriptor/ath_buf pairs, but right now that's not guaranteed to be done. Reviewed by: scottl (BUS_DMA_ALLOCNOW tag) Notes: svn path=/head/; revision=248999
*	Ensure that we only call the busdma unmap/flush routines once, when	Adrian Chadd	2013-04-01	1	-18/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the buffer is being freed. * When buffers are cloned, the original mapping isn't copied but it wasn't freeing the mapping until later. To be safe, free the mapping when the buffer is cloned. * ath_freebuf() now no longer calls the busdma sync/unmap routines. * ath_tx_freebuf() now calls sync/unmap. * Call sync first, before calling unmap. Tested: * AR5416, STA mode Notes: svn path=/head/; revision=248988
*	Remove an un-needed comment.	Adrian Chadd	2013-04-01	1	-6/+0
\| \| \| \|	Notes: svn path=/head/; revision=248986
*	Use ATH_MAX_SCATTER rather than ATH_TXDESC.	Adrian Chadd	2013-04-01	1	-1/+1
\| \| \| \| \| \| \| \|	ATH_MAX_SCATTER is used to size the ath_buf DMA segment array. We thus should use it when checking sizes of things. Notes: svn path=/head/; revision=248985
*	Add per-TXQ EDMA FIFO staging queue support.	Adrian Chadd	2013-03-26	1	-16/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each set of frames pushed into a FIFO is represented by a list of ath_bufs - the first ath_buf in the FIFO list is marked with ATH_BUF_FIFOPTR; the last ath_buf in the FIFO list is marked with ATH_BUF_FIFOEND. Multiple lists of frames are just glued together in the TAILQ as per normal - except that at the end of a FIFO list, the descriptor link pointer will be NULL and it'll be tagged with ATH_BUF_FIFOEND. For non-EDMA chipsets this is a no-op - the ath_txq frame list (axq_q) stays the same and is treated the same. For EDMA chipsets the frames are pushed into axq_q and then when the FIFO is to be (re) filled, frames will be moved onto the FIFO queue and then pushed into the FIFO. So: * Add a new queue in each hardware TXQ (ath_txq) for staging FIFO frame lists. It's a TAILQ (like the normal hardware frame queue) rather than the ath9k list-of-lists to represent FIFO entries. * Add new ath_buf flags - ATH_TX_FIFOPTR and ATH_TX_FIFOEND. * When allocating ath_buf entries, clear out the flag value before returning it or it'll end up having stale flags. * When cloning ath_buf entries, only clone ATH_BUF_MGMT. Don't clone the FIFO related flags. * Extend ath_tx_draintxq() to first drain the FIFO staging queue, _then_ drain the normal hardware queue. Tested: * AR9280, hostap * AR9280, STA * AR9380/AR9580 - hostap TODO: * Test on other chipsets, just to be thorough. Notes: svn path=/head/; revision=248745
*	Overhaul the TXQ locking (again!) as part of some beacon/cabq timing	Adrian Chadd	2013-03-24	1	-10/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	related issues. Moving the TX locking under one lock made things easier to progress on but it had one important side-effect - it increased the latency when handling CABQ setup when sending beacons. This commit introduces a bunch of new changes and a few unrelated changs that are just easier to lump in here. The aim is to have the CABQ locking separate from other locking. The CABQ transmit path in the beacon process thus doesn't have to grab the general TX lock, reducing lock contention/latency and making it more likely that we'll make the beacon TX timing. The second half of this commit is the CABQ related setup changes needed for sane looking EDMA CABQ support. Right now the EDMA TX code naively assumes that only one frame (MPDU or A-MPDU) is being pushed into each FIFO slot. For the CABQ this isn't true - a whole list of frames is being pushed in - and thus CABQ handling breaks very quickly. The aim here is to setup the CABQ list and then push _that list_ to the hardware for transmission. I can then extend the EDMA TX code to stamp that list as being "one" FIFO entry (likely by tagging the last buffer in that list as "FIFO END") so the EDMA TX completion code correctly tracks things. Major: * Migrate the per-TXQ add/removal locking back to per-TXQ, rather than a single lock. * Leave the software queue side of things under the ATH_TX_LOCK lock, (continuing) to serialise things as they are. * Add a new function which is called whenever there's a beacon miss, to print out some debugging. This is primarily designed to help me figure out if the beacon miss events are due to a noisy environment, issues with the PHY/MAC, or other. * Move the CABQ setup/enable to occur _after_ all the VAPs have been looked at. This means that for multiple VAPS in bursted mode, the CABQ gets primed once all VAPs are checked, rather than being primed on the first VAP and then having frames appended after this. Minor: * Add a (disabled) twiddle to let me enable/disable cabq traffic. It's primarily there to let me easily debug what's going on with beacon and CABQ setup/traffic; there's some DMA engine hangs which I'm finally trying to trace down. * Clear bf_next when flushing frames; it should quieten some warnings that show up when a node goes away. Tested: * AR9280, STA/hostap, up to 4 vaps (staggered) * AR5416, STA/hostap, up to 4 vaps (staggered) TODO: * (Lots) more AR9380 and later testing, as I may have missed something here. * Leverage this to fix CABQ hanling for AR9380 and later chips. * Force bursted beaconing on the chips that default to staggered beacons and ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being correctly set when chaining descriptors.) Notes: svn path=/head/; revision=248671
*	Break out the RX completion path into "FIFO check / refill" and	Adrian Chadd	2013-03-19	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"complete RX frames." The 128 entry RX FIFO is really easy to fill up and miss refilling when it's done in the ath taskq - as that gets blocked up doing RX completion, TX completion and other random things. So the 128 entry RX FIFO now gets emptied and refilled in the ath_intr() task (and it grabs / releases locks, so now ath_intr() can't just be a FAST handler yet!) but the locks aren't held for very long. The completion part is done in the ath taskqueue context. Details: * Create a new completed frame list - sc->sc_rx_rxlist; * Split the EDMA RX process queue into two halves - one that processes the RX FIFO and refills it with new frames; another that completes the completed frame list; * When tearing down the driver, flush whatever is in the deferred queue as well as what's in the FIFO; * Create two new RX methods - one that processes all RX queues, one that processes the given RX queue. When MSI is implemented, we get told which RX queue the interrupt came in on so we can specifically schedule that. (And I can do that with the non-MSI path too; I'll figure that out later.) * Convert the legacy code over to use these new RX methods; * Replace all the instances of the RX taskqueue enqueue with a call to a relevant RX method to enqueue one or all RX queues. Tested: * AR9380, STA * AR9580, STA * AR5413, STA Notes: svn path=/head/; revision=248529
*	Add locking around the new holdingbf code.	Adrian Chadd	2013-03-15	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Since this is being done during buffer free, it's a crap shoot whether the TX path lock is held or not. I tried putting the ath_freebuf() code inside the TX lock and I got all kinds of locking issues - it turns out that the buffer free path sometimes is called with the lock held and sometimes isn't. So I'll go and fix that soon. Hence for now the holdingbf buffers are protected by the TXBUF lock. Notes: svn path=/head/; revision=248311
*	Implement "holding buffers" per TX queue rather than globally.	Adrian Chadd	2013-03-14	1	-47/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When working on TDMA, Sam Leffler found that the MAC DMA hardware would re-read the last TX descriptor when getting ready to transmit the next one. Thus the whole ATH_BUF_BUSY came into existance - the descriptor must be left alone (very specifically the link pointer must be maintained) until the hardware has moved onto the next frame. He saw this in TDMA because the MAC would be frequently stopping during active transmit (ie, when it wasn't its turn to transmit.) Fast-forward to today. It turns out that this is a problem not with a single MAC DMA instance, but with each QCU (from 0->9). They each maintain separate descriptor pointers and will re-read the last descriptor when starting to transmit the next. So when your AP is busy transmitting from multiple TX queues, you'll (more) frequently see one QCU stopped, waiting for a higher-priority QCU to finsh transmitting, before it'll go ahead and continue. If you mess up the descriptor (ie by freeing it) then you're short of luck. Thanks to rpaulo for sticking with me whilst I diagnosed this issue that he was quite reliably triggering in his environment. This is a reimplementation; it doesn't have anything in common with the ath9k or the Qualcomm Atheros reference driver. Now - it in theory doesn't apply on the EDMA chips, as long as you push one complete frame into the FIFO at a time. But the MAC can DMA from a list of frames pushed into the hardware queue (ie, you concat 'n' frames together with link pointers, and then push the head pointer into the TXQ FIFO.) Since that's likely how I'm going to implement CABQ handling in hostap mode, it's likely that I will end up teaching the EDMA TX completion code about busy buffers, just to be "sure" this doesn't creep up. Tested - iperf ap->sta and sta->ap (with both sides running this code): * AR5416 STA * AR9160/AR9220 hostap To validate that it doesn't break the EDMA (FIFO) chips: * AR9380, AR9485, AR9462 STA Using iperf with the -S <tos byte decimal value> to set the TCP client side DSCP bits, mapping to different TIDs and thus different TX queues. TODO: * Make this work on the EDMA chips, if we end up pushing lists of frames to the hardware (eg how we eventually will handle cabq in hostap/ibss mode.) Notes: svn path=/head/; revision=248264
*	Print out the queue flags during a TX DMA shutdown.	Adrian Chadd	2013-03-09	1	-2/+5
\| \| \| \|	Notes: svn path=/head/; revision=248090
*	Add in the STBC TX/RX capability support into the HAL and driver.	Adrian Chadd	2013-02-27	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \|	The HAL already included the STBC fields; it just needed to be exposed to the driver and net80211 stack. This should allow single-stream STBC TX and RX to be negotiated; however the driver and rate control code currently don't do anything with it. Notes: svn path=/head/; revision=247366
*	Part #2 of the TX chainmask changes:	Adrian Chadd	2013-02-25	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Remove ar5416UpdateChainmasks(); * Remove the TX chainmask override code from the ar5416 TX descriptor setup routines; * Write a driver method to calculate the current chainmask based on the operating mode and update the driver state; * Call the HAL chainmask method before calling ath_hal_reset(); * Use the currently configured chainmask in the TX descriptors rather than the hardware TX chainmasks. Tested: * AR5416, STA/AP mode - legacy and 11n modes Notes: svn path=/head/; revision=247287
*	Disable debugging entries about BAW issues. I haven't seen any issues	Adrian Chadd	2013-02-21	1	-0/+2
\| \| \| \| \| \| \|	to do with BAW tracking in the last 9 months or so. Notes: svn path=/head/; revision=247135
*	Add an option to allow the minimum number of delimiters to be tweaked.	Adrian Chadd	2013-02-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	This is primarily for debugging purposes. Tested: * AR5416, STA mode Notes: svn path=/head/; revision=247087
*	Add a new option to limit the maximum size of aggregates.	Adrian Chadd	2013-02-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The default is to limit them to what the hardware is capable of. Add sysctl twiddles for both the non-RTS and RTS protected aggregate generation. Whilst here, add some comments about stuff that I've discovered during my exploration of the TX aggregate / delimiter setup path from the reference driver. Notes: svn path=/head/; revision=247085
*	Enable TX FIFO underrun interrupts. This allows the TX FIFO threshold	Adrian Chadd	2013-02-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	adjustment code to now run. Tested: * AR5416, STA TODO: * Much more thorough testing on the other chips, AR5210 -> AR9287 Notes: svn path=/head/; revision=247028
*	oops, tab!	Adrian Chadd	2013-02-20	1	-1/+1
\| \| \| \|	Notes: svn path=/head/; revision=247027
*	Post interrupts in the ath alq trace.	Adrian Chadd	2013-02-20	1	-0/+4
\| \| \| \|	Notes: svn path=/head/; revision=247026
*	CFG_ERR, DATA_UNDERRUN and DELIM_UNDERRUN are all flags, rather than	Adrian Chadd	2013-02-20	1	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	part of ts_status. Thus: * make sure we decode them from ts_flags, rather than ts_status; * make sure we decode them regardless of whether there's an error or not. This correctly exposes descriptor configuration errors, TX delimiter underruns and TX data underruns. Notes: svn path=/head/; revision=247025
*	Pull out the if_transmit() work and revert back to ath_start().	Adrian Chadd	2013-02-13	1	-441/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My changed had some rather significant behavioural changes to throughput. The two issues I noticed: * With if_start and the ifnet mbuf queue, any temporary latency would get eaten up by some mbufs being queued. With ath_transmit() queuing things to ath_buf's, I'd only get 512 TX buffers before I couldn't queue any further frames. * There's also some non-zero latency involved with TX being pushed into a taskqueue via direct dispatch. Any time the scheduler didn't immediately schedule the ath TX task would cause extra latency. Various 1ge/10ge drivers implement both direct dispatch (if the TX lock can be acquired) and deferred task transmission (if the TX lock can't be acquired), with frames being pushed into a drbd queue. I'll have to do this at some point, but until I figure out how to deal with 802.11 fragments, I'll have to wait a while longer. So what I saw: * lots of extra latency, specially under load - if the taskqueue wasn't immediately scheduled, things went pear shaped; * any extra latency would result in TX ath_buf's taking their sweet time being replenished, so any further calls to ath_transmit() would drop mbufs. * .. yes, there's no explicit backpressure here - things are just dropped. Eek. With this, the general performance has gone up, but those subtle if_start() related race conditions are back. For some reason, this is doubly-obvious with the AR5416 NIC and I don't quite understand why yet. There's an unrelated issue with AR5416 performance in STA mode (it's fine in AP mode when bridging frames, weirdly..) that requires a little further investigation. Specifically - it works fine on a Lenovo T40 (single core CPU) running a March 2012 9-STABLE kernel, but a Lenovo T60 (dual core) running an early November 2012 kernel behaves very poorly. The same hardware with an AR9160 or AR9280 behaves perfectly. Notes: svn path=/head/; revision=246745
*	Go back to direct-dispatch of the software queue and frame TX paths	Adrian Chadd	2013-02-11	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when they're being called from the TX completion handler. Going (back) through the taskqueue is just adding extra locking and latency to packet operations. This improves performance a little bit on most NICs. It still hasn't restored the original performance of the AR5416 NIC but the AR9160, AR9280 and later NICs behave very well with this. Tested: * AR5416 STA (still tops out at ~ 70mbit TCP, rather than 150mbit TCP..) * AR9160 hostap (good for both TX and RX) * AR9280 hostap (good for both TX and RX) Notes: svn path=/head/; revision=246650
*	Create a new TX lock specifically for queuing frames.	Adrian Chadd	2013-02-07	1	-14/+8
\| \| \| \| \| \| \| \|	This now separates out the act of queuing frames from the act of running TX and TX completion. Notes: svn path=/head/; revision=246453
*	Methodize the process of adding the software TX queue to the taskqueue.	Adrian Chadd	2013-02-07	1	-2/+2
\| \| \| \| \| \| \|	Move it (for now) to the TX taskqueue. Notes: svn path=/head/; revision=246450
*	Migrate the TX sending code out from under the ath0 taskq and into	Adrian Chadd	2013-01-26	1	-3/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the separate ath0 TX taskq. Whilst here, make sure that the TX software scheduler is also running out of the TX task, rather than the ath0 taskqueue. Make sure that the tx taskqueue is blocked/unblocked as necessary. This allows for a little more parallelism on multi-core machines, as well as (eventually) supporting a higher task priority for TX tasks, allowing said TX task to preempt an already running RX or TX completion task. Tested: * AR5416, AR9280 hostap and STA modes Notes: svn path=/head/; revision=245927
*	Fix hangs (exposed by spectral scan activity) in STA mode when the	Adrian Chadd	2013-01-17	1	-1/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	chip hangs. * Always do a reset in ath_bmiss_proc(), regardless of whether the hardware is "hung" or not. Specifically, for spectral scan, there's likely a whole bunch of potential hangs that we don't (yet) recognise in the HAL. So to avoid staying RX deaf persisting until the station disassociates, just do a no-loss reset. * Set sc_beacons=1 in STA mode. During a reset, the beacon programming isn't done. (It's likely I need to set sc_syncbeacons during a hang reset, but I digress.) Thus after a reset, there's no beacon timer programming to send a BMISS interrupt if beacons aren't heard .. thus if the AP disappears, you won't get notified and you'll have to reset your interface. This hasn't yet fixed all of the hangs that I've seen when debugging spectral scan, but it's certainly reduced the hang frequency and it should improve general STA stability in very noisy environments. Tested: * AR9280, STA mode, spectral scan off/on PR: kern/175227 Notes: svn path=/head/; revision=245556
*	Implement frame (data) transmission using if_transmit(), rather than	Adrian Chadd	2013-01-15	1	-98/+435
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	if_start(). This removes the overlapping data path TX from occuring, which solves quite a number of the potential TX queue races in ath(4). It doesn't fix the net80211 layer TX queue races and it doesn't fix the raw TX path yet, but it's an important step towards this. This hasn't dropped the TX performance in my testing; primarily because now the TX path can quickly queue frames and continue along processing. This involves a few rather deep changes: * Use the ath_buf as a queue placeholder for now, as we need to be able to support queuing a list of mbufs (ie, when transmitting fragments) and m_nextpkt can't be used here (because it's what is joining the fragments together) * if_transmit() now simply allocates the ath_buf and queues it to a driver TX staging queue. * TX is now moved into a taskqueue function. * The TX taskqueue function now dequeues and transmits frames. * Fragments are handled correctly here - as the current API passes the fragment list as one mbuf list (joined with m_nextpkt) through to the driver if_transmit(). * For the couple of places where ath_start() may be called (mostly from net80211 when starting the VAP up again), just reimplement it using the new enqueue and taskqueue methods. What I don't like (about this work and the TX code in general): * I'm using the same lock for the staging TX queue management and the actual TX. This isn't required; I'm just being slack. * I haven't yet moved TX to a separate taskqueue (but the taskqueue is created); it's easy enough to do this later if necessary. I just need to make sure it's a higher priority queue, so TX has the same behaviour as it used to (where it would preempt existing RX..) * I need to re-review the TX path a little more and make sure that ieee80211_node_() functions aren't called within the TX lock. When queueing, I should just push failed frames into a queue and when I'm wrapping up the TX code, unlock the TX lock and call ieee80211_node_free() on each. It would be nice if I could hold the TX lock for the entire TX and TX completion, rather than this release/re-acquire behaviour. But that requires that I shuffle around the TX completion code to handle actual ath_buf free and net80211 callback/free outside of the TX lock. That's one of my next projects. * the ic_raw_xmit() path doesn't use this yet - so it still has sequencing problems with parallel, overlapping calls to the data path. I'll fix this later. Tested: * Hostap - AR9280, AR9220 * STA - AR5212, AR9280, AR5416 Notes: svn path=/head/; revision=245465
*	Add a new (skeleton) spectral mode manager module.	Adrian Chadd	2013-01-02	1	-0/+25
\| \| \| \|	Notes: svn path=/head/; revision=244951
*	Fix typo in comment.	Baptiste Daroussin	2012-12-28	1	-1/+1
\| \| \| \| \| \| \|	Submitted by: Christoph Mallon <christoph.mallon@gmx.de> Notes: svn path=/head/; revision=244790
*	Delete the per-TXQ locks and replace them with a single TX lock.	Adrian Chadd	2012-12-02	1	-21/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I couldn't think of a way to maintain the hardware TXQ locks _and_ layer on top of that per-TXQ software queuing and any other kind of fine-grained locks (eg per-TID, or per-node locks.) So for now, to facilitate some further code refactoring and development as part of the final push to get software queue ps-poll and u-apsd handling into this driver, just do away with them entirely. I may eventually bring them back at some point, when it looks slightly more architectually cleaner to do so. But as it stands at the present, it's not really buying us much: * in order to properly serialise things and not get bitten by scheduling and locking interactions with things higher up in the stack, we need to wrap the whole TX path in a long held lock. Otherwise we can end up being pre-empted during frame handling, resulting in some out of order frame handling between sequence number allocation and encryption handling (ie, the seqno and the CCMP IV get out of sequence); * .. so whilst that's the case, holding the lock for that long means that we're acquiring and releasing the TXQ lock _inside_ that context; * And we also acquire it per-frame during frame completion, but we currently can't hold the lock for the duration of the TX completion as we need to call net80211 layer things with the locks _unheld_ to avoid LOR. * .. the other places were grab that lock are reset/flush, which don't happen often. My eventual aim is to change the TX path so all rejected frame transmissions and all frame completions result in any ieee80211_free_node() calls to occur outside of the TX lock; then I can cut back on the amount of locking that goes on here. There may be some LORs that occur when ieee80211_free_node() is called when the TX queue path fails; I'll begin to address these in follow-up commits. Notes: svn path=/head/; revision=243786
*	Call if_free() with the correct vnet context if and only if ifp_vnet	Adrian Chadd	2012-11-28	1	-2/+7
\| \| \| \| \| \| \| \| \| \|	isn't NULL. If the attach fails prematurely and there's no if_vnet context, calling CURVNET_SET(ifp->if_vnet) is going to dereference a NULL pointer. Notes: svn path=/head/; revision=243648
*	ALQ logging enhancements:	Adrian Chadd	2012-11-16	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* upon setup, tell the alq code what the chip information is. * add TX/RX path logging for legacy chips. * populate the tx/rx descriptor length fields with a best-estimate. It's overly big (96 bytes when AH_SUPPORT_AR5416 is enabled) but it'll do for now. Whilst I'm here, add CURVNET_RESTORE() here during probe/attach as a partial solution to fixing crashes during attach when the attach fails. There are other attach failures that I have to deal with; those'll come later. Notes: svn path=/head/; revision=243162
*	Correctly fix the 'scan during STA mode' crash.	Adrian Chadd	2012-11-11	1	-0/+7
\| \| \| \|	Notes: svn path=/head/; revision=242899
*	Don't compile in my (not yet committed) ath_alq code unless ATH_DEBUG_ALQ	Adrian Chadd	2012-11-07	1	-3/+3
\| \| \| \| \| \| \| \| \|	is defined. This will unbreak ATH_DEBUG builds. Notes: svn path=/head/; revision=242698
*	Disable my software queue TIM and PS handling for now.	Adrian Chadd	2012-11-07	1	-0/+37
\| \| \| \| \| \| \| \| \| \|	ps-poll is totally broken in its current form. This should unbreak things enough to let people use PS-POLL devices, but leave it in place for me to finish PS-POLL handling. Notes: svn path=/head/; revision=242690
*	Add a new HAL call to extract out the HAL enterprise bits from the	Adrian Chadd	2012-11-03	1	-0/+7
\| \| \| \| \| \| \|	AR9300 HAL. Notes: svn path=/head/; revision=242527
*	I give up - introduce a TX lock to serialise TX operations.	Adrian Chadd	2012-10-31	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've tried serialising TX using queues and such but unfortunately due to how this interacts with the locking going on elsewhere in the networking stack, the TX task gets delayed, resulting in quite a noticable throughput loss: * baseline TCP for 2x2 11n HT40 is ~ 170mbit/sec; * TCP for TX task in the ath taskq, with the RX also going on - 80mbit/sec; * TCP for TX task in a separate, second taskq - 100mbit/sec. So for now I'm going with the Linux wireless stack approach - lock tx early. The linux code does in the wireless stack, before the 802.11 state stuff happens and before it's punted to the driver. But TX locking needs to also occur at the driver layer as the TX completion code _also_ begins to drain the ifnet TX queue. Whilst I'm here, add some KTR traces for the TX path. Note: * This really should be done at the net80211 layer (as well, at least.) But that'll have to wait for a little more thought to happen. Notes: svn path=/head/; revision=242391
*	Begin fleshing out some software queue awareness for TIM handling with	Adrian Chadd	2012-10-28	1	-0/+244
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the power save queue. * introduce some new ATH_NODE lock protected fields, tracking the net80211 psq and TIM state; * when doing buffer transitions - ie, when sending and completing buffers - check the state of the SWQ and update the TIM appropriately. * when clearing the TIM bit, if the SWQ is not empty then delay clearing it. This is racy, but it's no less racy than the current net80211 power save queue management code. Specifically, with multiple TX threads, it's quite plausible that parallel state updates will race and the TIM will be left in an inconsistent state. I'll address that in a follow-up commit. Notes: svn path=/head/; revision=242271
*	Add a temporary (for values of "temporary") work around for hotplug	Adrian Chadd	2012-10-28	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	support with ath(4) and VIMAGE. Right now the VIMAGE code doesn't supply a default vnet context during: * hotplug attach; * any device detach. It special cases kldload/boot time probing (by setting the context to vnet0) but that doesn't occur when probing devices during a bus rescan - eg, adding a cardbus card. These will eventually go away when the VIMAGE support extends to providing default contexts to hotplug attach/detach. Notes: svn path=/head/; revision=242258
*	Push the actual TX processing into the ath taskqueue, rather than having	Adrian Chadd	2012-10-14	1	-14/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	it run out of multiple concurrent contexts. Right now the ath(4) TX processing is a bit hairy. Specifically: * It was running out of ath_start(), which could occur from multiple concurrent sending processes (as if_start() can be started from multiple sending threads nowdays.. sigh) * during RX if fast frames are enabled (so not really at the moment, not until I fix this particular feature again..) * during ath_reset() - so anything which calls that * during ath_tx_proc() in the ath taskqueue - ie, TX is attempted again after TX completion, as there's now hopefully some ath_bufs available. Then, the ic_raw_xmit() method can queue raw frames for transmission at any time, from any net80211 TX context. Ew. This has caused packet ordering issues in the past - specifically, there's absolutely no guarantee that preemption won't occuring _during_ ath_start() by the TX completion processing, which will call ath_start() again. It's a mess - 802.11 really, really wants things to be in sequence or things go all kinds of loopy. So: * create a new task struct for TX'ing; * make the if_start method simply queue the task on the ath taskqueue; * make ath_start() just be called by the new TX task; * make ath_tx_kick() just schedule the ath TX task, rather than directly calling ath_start(). Now yes, this means that I've taken a step backwards in terms of concurrency - TX -and- RX now occur in the same single-task taskqueue. But there's nothing stopping me from separating out the TX / TX completion code into a separate taskqueue which runs in parallel with the RX path, if that ends up being appropriate for some platforms. This fixes the CCMP/seqno concurrency issues that creep up when you transmit large amounts of uni-directional UDP traffic (>200MBit) on a FreeBSD STA -> AP, as now there's only one TX context no matter what's going on (TX completion->retry/software queue, userland->net80211->ath_start(), TX completion -> ath_start()); but it won't fix any concurrency issues between raw transmitted frames and non-raw transmitted frames (eg EAPOL frames on TID 16 and any other TID 16 multicast traffic that gets put on the CABQ.) That is going to require a bunch more re-architecture before it's feasible to fix. In any case, this is a big step towards making the majority of the TX path locking irrelevant, as now almost all TX activity occurs in the taskqueue. Phew. Notes: svn path=/head/; revision=241559
*	Initialise an uninitialised variable.	Adrian Chadd	2012-10-05	1	-1/+2
\| \| \| \|	Notes: svn path=/head/; revision=241229
*	Pause and unpause the software queues for a given node based on the	Adrian Chadd	2012-10-03	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	net80211 node power save state. * Add an ATH_NODE_UNLOCK_ASSERT() check * Add a new node field - an_is_powersave * Pause/unpause the queue based on the node state * Attempt to handle net80211 concurrency issues so the queue doesn't get paused/unpaused more than once at a time from the net80211 power save code. Whilst here (and breaking my usual rule), set CLRDMASK when a queue is unpaused, regardless of whether the queue has some pending traffic. This means the first frame from that TID (now or later) will hvae CLRDMASK set. Also whilst here, bump the swretrymax counters whenever the filtered frames code expires a frame. Again, breaking my rule, but this is just a statistics thing rather than a functional change. This doesn't fix ps-poll (but it doesn't break it too much worse than it is at the present) or correcting the TID updates. That's next on the list. Tested: * AR9220 AP (Atheros AP96 reference design) * Macbook Pro and LG Optimus 1 Android phone, both setting and clearing power save state (but not using PS-POLL.) Notes: svn path=/head/; revision=241170
*	Migrate the ath(4) KTR logging to use an ATH_KTR() macro.	Adrian Chadd	2012-09-24	1	-8/+39
\| \| \| \| \| \| \| \| \| \|	This should eventually be unified with ATH_DEBUG() so I can get both from one macro; that may take some time. Add some new probes for TX and TX completion. Notes: svn path=/head/; revision=240899
*	Remove TDMA #define entries from if_ath.c; they now exist in if_ath_tdma.h.	Adrian Chadd	2012-09-09	1	-16/+0
\| \| \| \|	Notes: svn path=/head/; revision=240254
*	There's no nede to allocate a DMA map just before calling bus_dmamem_alloc().	Adrian Chadd	2012-08-29	1	-11/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In fact, bus_dmamem_alloc() happily NULLs the dmat pointer passed in, before replacing it with its own. This fixes a MIPS crash when kldload'ing if_ath/if_ath_pci - bus_dmamap_destroy() was passed in a NULL dmat pointer and was doing all kinds of very bad things. Reviewed by: scottl Notes: svn path=/head/; revision=239865
*	Implement a sequential descriptor ID value and stuff it in the ath_buf.	Adrian Chadd	2012-08-15	1	-0/+8
\| \| \| \| \| \| \| \|	This will be used by the EDMA TX code to assign descriptor IDs in order to provide some debugging. Notes: svn path=/head/; revision=239282
*	Break out the TX completion code into a separate function, so it can be	Adrian Chadd	2012-08-14	1	-35/+62
\| \| \| \| \| \| \| \| \| \| \| \|	re-used by the upcoming EDMA TX completion code. Make ath_stoptxdma() public, again so the EDMA TX code can use it. Don't check for the TXQ bitmap in the ISR when doing EDMA work as it doesn't apply for EDMA. Notes: svn path=/head/; revision=239262
*	Revert the ath_tx_draintxq() method, and instead teach it the minimum	Adrian Chadd	2012-08-12	1	-3/+24
\| \| \| \| \| \| \| \| \| \| \| \|	necessary to "do" EDMA. It was just using the TX completion status for logging information about the descriptor completion. Since with EDMA we don't know this without checking the TX completion FIFO, we can't provide this information. So don't. Notes: svn path=/head/; revision=239205
*	Break out ath_draintxq() into a method and un-methodize ath_tx_processq().	Adrian Chadd	2012-08-12	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that I understand what's going on with this, I've realised that it's going to be quite difficult to implement a processq method in the EDMA case. Because there's a separate TX status FIFO, I can't just run processq() on each EDMA TXQ to see what's finished. i have to actually run the TX status queue and handle individual TXQs. So: * unmethodize ath_tx_processq(); * leave ath_tx_draintxq() as a method, as it only uses the completion status for debugging rather than actively completing the frames (ie, all frames here are failed); * Methodize ath_draintxq(). The EDMA ath_draintxq() will have to take care of running the TX completion FIFO before (potentially) freeing frames in the queue. The only two places where ath_tx_draintxq() (on a single TXQ) are used: * ath_draintxq(); and * the CABQ handling in the beacon setup code - it drains the CABQ before populating the CABQ with frames for a new beacon (when doing multi-VAP operation.) So it's quite possible that once I methodize the CABQ and beacon handling, I can just drop ath_tx_draintxq() in its entirety. Finally, it's also quite possible that I can remove ath_tx_draintxq() in the future and just "teach" it to not check the status when doing EDMA. Notes: svn path=/head/; revision=239204
*	Extend the beacon code slightly to support AP mode beaconing for the	Adrian Chadd	2012-08-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EDMA HAL hardware. * The EDMA HAL code assumes the nexttbtt and intval values are in TU/8 units, rather than TU. For now, just "hack" around that here, at least until I code up something to translate it in the HAL. * Setup some different TXQ flags for EDMA hardware. * The EDMA HAL doesn't support setting the first rate series via ath_hal_setuptxdesc() - instead, a call to ath_hal_set11nratescenario() is always required. So for now, just do an 11n rate series setup for EDMA beacon frames. This allows my AR9380 to successfully transmit beacon frames. However, CABQ TX and all normal data frame TX and TX completion is still not functional and will require some more significant code churn to make work. Notes: svn path=/head/; revision=239201