summaryrefslogtreecommitdiff
path: root/sys/dev/xen
Commit message (Collapse)AuthorAgeFilesLines
* Make MAXPHYS tunable. Bump MAXPHYS to 1M.Konstantin Belousov2020-11-282-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (*). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav (*) Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225 Notes: svn path=/head/; revision=368124
* Suspend all writeable local filesystems on power suspend.Konstantin Belousov2020-11-051-0/+3
| | | | | | | | | | | | | | | | | | | | | This ensures that no writes are pending in memory, either metadata or user data, but not including dirty pages not yet converted to fs writes. Only filesystems declared local are suspended. Note that this does not guarantee absence of the metadata errors or leaks if resume is not done: for instance, on UFS unlinked but opened inodes are leaked and require fsck to gc. Reviewed by: markj Discussed with: imp Tested by: imp (previous version), pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D27054 Notes: svn path=/head/; revision=367398
* Convert allocations of the phys pager to vm_pager_allocate().Konstantin Belousov2020-09-081-1/+2
| | | | | | | | | | | | | | Future changes would require additional initialization of OBJT_PHYS objects, and vm_object_allocate() is not suitable for it. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24652 Notes: svn path=/head/; revision=365485
* dev/xen: clean up empty lines in .c and .h filesMateusz Guzik2020-09-0114-85/+30
| | | | Notes: svn path=/head/; revision=365128
* vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_errorMateusz Guzik2020-08-191-1/+1
| | | | | | | Most consumers pass NULL. Notes: svn path=/head/; revision=364372
* Remove double-calls to tc_get_timecount() to warm timecounters.Konstantin Belousov2020-06-101-1/+0
| | | | | | | | | | | | It seems that second call does not add any useful state change for all implemented timecounters. Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Notes: svn path=/head/; revision=362033
* xen/control: short circuit xctrl_on_watch_event on spurious eventRoger Pau Monné2020-05-281-1/+1
| | | | | | | | | | | | | | If there's no data to read from xenstore short-circuit xctrl_on_watch_event to return early, there's no reason to continue since the lack of data would prevent matching against any known event type. Sponsored by: Citrix Systems R&D MFC with: r352925 MFC after: 1 week Notes: svn path=/head/; revision=361580
* xen/blkfront: use the correct type for disk sectorsRoger Pau Monné2020-05-281-4/+5
| | | | | | | | | | | | | | | | | The correct type to use to represent disk sectors is blkif_sector_t (which is an uint64_t underneath). This avoid truncation of the disk size calculation when resizing on i386, as otherwise the calculation of d_mediasize in xbd_connect is truncated to the size of unsigned long, which is 32bits on i386. Note this issue didn't affect amd64, because the size of unsigned long is 64bits there. Sponsored by: Citrix Systems R&D MFC after: 1 week Notes: svn path=/head/; revision=361579
* dev/xenstore: fix return with locks heldRoger Pau Monné2020-05-201-5/+6
| | | | | | | | | | | | | | | | | | | Fix returning from xenstore device with locks held, which triggers the following panic: # cat /dev/xen/xenstore ^C userret: returning with the following locks held: exclusive sx evtchn_ringc_sx (evtchn_ringc_sx) r = 0 (0xfffff8000650be40) locked @ /usr/src/sys/dev/xen/evtchn/evtchn_dev.c:262 Note this is not a security issue since access to the device is limited to root by default. Sponsored by: Citrix Systems R&D MFC after: 1 week Notes: svn path=/head/; revision=361274
* tty: convert tty_lock_assert to tty_assert_locked to hide lock typeKyle Evans2020-04-171-1/+1
| | | | | | | | | | | | | | | | | A later change, currently being iterated on in D24459, will in-fact change the lock type to an sx so that TTY drivers can sleep on it if they need to. Committing this ahead of time to make the review in question a little more palatable. tty_lock_assert() is unfortunately still needed for now in two places to make sure that the tty lock has not been recursed upon, for those scenarios where it's supplied by the TTY driver and possibly a mutex that is allowed to recurse. Suggested by: markj Notes: svn path=/head/; revision=360051
* Remove noise that once upon a time allowed netback to build on FreeBSD 6. TheWarner Losh2020-03-011-2/+0
| | | | | | | network layer has evolved since then, and this won't compile there. Notes: svn path=/head/; revision=358495
* Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (16 of many)Pawel Biernacki2020-02-254-7/+11
| | | | | | | | | | | | | | | | | | r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. Reviewed by: royger Approved by: kib (mentor, blanket) Differential Revision: https://reviews.freebsd.org/D23638 Notes: svn path=/head/; revision=358316
* Ever since the block layer expanded its command syntax beyond justScott Long2020-02-071-1/+3
| | | | | | | | | | | BIO_READ and BIO_WRITE, we've handled this expanded syntax poorly in drivers when the driver doesn't support a particular command. Do a sweep and fix that. Reported by: imp Notes: svn path=/head/; revision=357647
* xen/console: fix priority of Xen consoleRoger Pau Monné2020-02-061-1/+2
| | | | | | | | | | | | | | | | | | Currently the Xen console is always attached with priority CN_REMOTE (highest), which means that when booting with a single console the Xen console will take preference over the VGA for example, and that's not intended unless the user has also selected to use a serial console. Fix this by lowering the priority of the Xen console to NORMAL unless the user has selected to use a serial console. This keeps the usual FreeBSD behavior of outputting to the internal consoles (ie: VGA) when booted as a Xen dom0. MFC after: 3 days Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=357616
* Introduce flag IFF_NEEDSEPOCH that marks Ethernet interfaces thatGleb Smirnoff2020-01-231-1/+1
| | | | | | | | | | | supposedly may call into ether_input() without network epoch. They all need to be reviewed before 13.0-RELEASE. Some may need be fixed. The flag is not planned to be used in the kernel for a long time. Notes: svn path=/head/; revision=357010
* Add KERNEL_PANICKED macro for use in place of direct panicstr testsMateusz Guzik2020-01-121-2/+2
| | | | Notes: svn path=/head/; revision=356655
* vfs: drop the mostly unused flags argument from VOP_UNLOCKMateusz Guzik2020-01-031-4/+4
| | | | | | | | | | | Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427 Notes: svn path=/head/; revision=356337
* Make page busy state deterministic on free. Pages must be xbusy whenJeff Roberson2019-12-222-16/+11
| | | | | | | | | | | | | | | | | | | removed from objects including calls to free. Pages must not be xbusy when freed and not on an object. Strengthen assertions to match these expectations. In practice very little code had to change busy handling to meet these rules but we can now make stronger guarantees to busy holders and avoid conditionally dropping busy in free. Refine vm_page_remove() and vm_page_replace() semantics now that we have stronger guarantees about busy state. This removes redundant and potentially problematic code that has proliferated. Discussed with: markj Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22822 Notes: svn path=/head/; revision=356002
* vfs: introduce v_irflag and make v_type smallerMateusz Guzik2019-12-081-1/+1
| | | | | | | | | | | | | | | | | | The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715 Notes: svn path=/head/; revision=355537
* (4/6) Protect page valid with the busy lock.Jeff Roberson2019-10-152-2/+2
| | | | | | | | | | | | | | Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594 Notes: svn path=/head/; revision=353539
* (1/6) Replace busy checks with acquires where it is trival to do so.Jeff Roberson2019-10-152-2/+2
| | | | | | | | | | | | | | This is the first in a series of patches that promotes the page busy field to a first class lock that no longer requires the object lock for consistency. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21548 Notes: svn path=/head/; revision=353535
* Remove an unneeded include of opt_sctp.h.Mark Johnston2019-10-111-2/+0
| | | | | | | | MFC after: 1 week Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=353444
* xen/ctrl: acknowledge all control requestsRoger Pau Monné2019-10-011-6/+5
| | | | | | | | | | | | | | | | Currently only suspend requests are acknowledged by writing an empty string back to the xenstore control node, but poweroff or reboot requests are not acknowledged and FreeBSD simply proceeds to perform the desired action. Fix this by acknowledging all requests, and remove the suspend specific ack done in the handler. Sponsored by: Citrix Systems R&D MFC after: 3 days Notes: svn path=/head/; revision=352925
* Replace redundant code with a few new vm_page_grab facilities:Jeff Roberson2019-09-102-2/+2
| | | | | | | | | | | | | | - VM_ALLOC_NOCREAT will grab without creating a page. - vm_page_grab_valid() will grab and page in if necessary. - vm_page_busy_acquire() automates some busy acquire loops. Discussed with: alc, kib, markj Tested by: pho (part of larger branch) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21546 Notes: svn path=/head/; revision=352176
* Change synchonization rules for vm_page reference counting.Mark Johnston2019-09-092-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486 Notes: svn path=/head/; revision=352110
* Replace uses of vm_page_unwire(m, PQ_NONE) with vm_page_unwire_noq(m).Mark Johnston2019-06-071-1/+1
| | | | | | | | | | | | | | | These calls are not the same in general: the former will dequeue the page if it is enqueued, while the latter will just leave it alone. But, all existing uses of the former apply to unmanaged pages, which are never enqueued in the first place. No functional change intended. Reviewed by: kib MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20470 Notes: svn path=/head/; revision=348785
* Extract eventfilter declarations to sys/_eventfilter.hConrad Meyer2019-05-202-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped. Notes: svn path=/head/; revision=347984
* Implement support for online disk capacity changes.Pawel Jakub Dawidek2019-03-301-3/+32
| | | | | | | | Obtained from: Fudo Security Tested in: AWS Notes: svn path=/head/; revision=345726
* Change the vm_ooffset_t type to unsigned.Konstantin Belousov2018-12-021-4/+4
| | | | | | | | | | | | | | | | | | | The type represents byte offset in the vm_object_t data space, which does not span negative offsets in FreeBSD VM. The change matches byte offset signess with the unsignedness of the vm_pindex_t which represents the type of the page indexes in the objects. This allows to remove the UOFF_TO_IDX() macro which was used when we have to forcibly interpret the type as unsigned anyway. Also it fixes a lot of implicit bugs in the device drivers d_mmap methods. Reviewed by: alc, markj (previous version) Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341398
* xen: temporary disable SMAP when forwarding hypercalls from user-spaceRoger Pau Monné2018-09-131-1/+13
| | | | | | | | | | | | | | | | | The Xen page-table walker used to resolve the virtual addresses in the hypercalls will refuse to access user-space pages when SMAP is enabled unless the AC flag in EFLAGS is set (just like normal hardware with SMAP support would do). Since privcmd allows forwarding hypercalls (and buffers) from user-space into Xen make sure SMAP is temporary disabled for the duration of the hypercall from user-space. Approved by: re (gjb) Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=338632
* xen/netfront: Ensure curvnet is setKristof Provost2018-08-231-0/+4
| | | | | | | | | | | netfront_backend_changed() is called from the xenwatch_thread(), which means that the curvnet is not set. We have to set it before we can call things like arp_ifinit(). PR: 230845 Notes: svn path=/head/; revision=338256
* Make timespecadd(3) and friends publicAlan Somers2018-07-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | The timespecadd(3) family of macros were imported from NetBSD back in r35029. However, they were initially guarded by #ifdef _KERNEL. In the meantime, we have grown at least 28 syscalls that use timespecs in some way, leading many programs both inside and outside of the base system to redefine those macros. It's better just to make the definitions public. Our kernel currently defines two-argument versions of timespecadd and timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define three-argument versions. Solaris also defines a three-argument version, but only in its kernel. This revision changes our definition to match the common three-argument version. Bump _FreeBSD_version due to the breaking KPI change. Discussed with: cem, jilles, ian, bde Differential Revision: https://reviews.freebsd.org/D14725 Notes: svn path=/head/; revision=336914
* xen/grants: fix deadlocks in the free callbacksRoger Pau Monné2018-07-301-1/+1
| | | | | | | | | | | | | | | | | | | This fixes the panic caused by deadlocking when grant-table free callbacks are used. The cause of the recursion is: check_free_callbacks() is always called with the lock gnttab_list_lock held. In turn the callback function is also called with the lock held. Then when the client uses any of the grant reference methods which also attempt the lock the gnttab_list_lock mutex from within the free callback a deadlock happens. Fix this by making the gnttab_list_lock recursive. Submitted by: Pratyush Yadav <pratyush@freebsd.org> Differential Revision: https://reviews.freebsd.org/D16505 Notes: svn path=/head/; revision=336897
* xen-blkfront: fix memory leak in xbd_connect error pathRoger Pau Monné2018-07-301-2/+9
| | | | | | | | | | | | If gnttab_grant_foreign_access() fails for any of the indirection pages, the code breaks out of both the loops without freeing the local variable indirectpages, causing a memory leak. Submitted by: Pratyush Yadav <pratyush@freebsd.org> Differential Review: https://reviews.freebsd.org/D16136 Notes: svn path=/head/; revision=336896
* xen-blkfront: fix length checkRoger Pau Monné2018-07-301-2/+2
| | | | | | | | | | | Length is an unsigned integer, so checking against < 0 doesn't make sense. While there also make clear that a length of 0 always succeeds. Submitted by: Pratyush Yadav <pratyush@freebsd.org> Differential Review: https://reviews.freebsd.org/D16045 Notes: svn path=/head/; revision=336895
* xen: attach the PV CPU if no CPU device is presentRoger Pau Monné2018-07-191-2/+2
| | | | | | | | | | | | | When booted as PVHv2, there's no ACPI CPU object, so attach the PV CPU device in order to take it's place. This is required in case some device or driver tries to poke at the PCPU device field. Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=336472
* xen: do not limit PV console usage to PV guestsRoger Pau Monné2018-07-191-8/+3
| | | | | | | | | | The Xen PV console is also available to HVM and PVHv2 guests, so don't limit the console usage to PV guests only. Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=336471
* xen: remove direct usage of HYPERVISOR_start_infoRoger Pau Monné2018-07-194-48/+53
| | | | | | | | | | | | | | | | | | | HYPERVISOR_start_info is only available to PV and PVHv1 guests, HVM and PVHv2 guests get this data from HVM parameters that are fetched using a hypercall. Instead provide a set of helper functions that should be used to fetch this data. The helper functions have different implementations depending on whether FreeBSD is running as PVHv1 or HVM/PVHv2 guest type. This helps to cleanup generic Xen code by removing quite a lot of xen_pv_domain and xen_hvm_domain macro usages. Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=336470
* xen-netback: fix LORRoger Pau Monné2018-06-261-3/+3
| | | | | | | | | | | | | | | | | | lock order reversal: (sleepable after non-sleepable) 1st 0xfffffe00357ff538 xnb_softc (xen netback softc lock) @ /usr/src/sys/dev/xen/netback/netback.c:1069 2nd 0xffffffff81fdccb0 intrsrc (intrsrc) @ /usr/src/sys/x86/x86/intr_machdep.c:224 There's no need to hold the lock since the cleaning of the interrupt cannot happen in parallel due to the XNBF_IN_SHUTDOWN flag being set. Note that the locking in netback needs some improvement or clarification. While there also remove a double newline. Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=335664
* xen: check if there are clients waiting in gnttab_end_foreign_access_referencesRoger Pau Monné2018-06-211-0/+1
| | | | | | | | | | | | | | | | | | | Without a call to check_free_callbacks() clients waiting for grant references would not be woken up even when there are sufficient grant references available. The check was likely left out as a mistake when the function was first added. Note that other functions used to free grant references already call check_free_callbacks. Submitted by: pratyush Reviewed by: royger Differential review: https://reviews.freebsd.org/D15899 Notes: svn path=/head/; revision=335490
* xen/evtchn: fix LOR in evtchn deviceRoger Pau Monné2018-05-241-2/+2
| | | | | | | | | | | Remove the device from the list before unbinding it. Doing it in this order allows calling xen_intr_unbind without holding the bind_mutex lock. Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=334146
* xen-blkback: don't unbind the interrupt while holding the lockRoger Pau Monné2018-05-241-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no need to perform the interrupt unbind while holding the blkback lock, and doing so leads to the following LOR: lock order reversal: (sleepable after non-sleepable) 1st 0xfffff8000802fe90 xbbd1 (xbbd1) @ /usr/src/sys/dev/xen/blkback/blkback.c:3423 2nd 0xffffffff81fdf890 intrsrc (intrsrc) @ /usr/src/sys/x86/x86/intr_machdep.c:224 stack backtrace: #0 0xffffffff80bdd993 at witness_debugger+0x73 #1 0xffffffff80bdd814 at witness_checkorder+0xe34 #2 0xffffffff80b7d798 at _sx_xlock+0x68 #3 0xffffffff811b3913 at intr_remove_handler+0x43 #4 0xffffffff811c63ef at xen_intr_unbind+0x10f #5 0xffffffff80a12ecf at xbb_disconnect+0x2f #6 0xffffffff80a12e54 at xbb_shutdown+0x1e4 #7 0xffffffff80a10be4 at xbb_frontend_changed+0x54 #8 0xffffffff80ed66a4 at xenbusb_back_otherend_changed+0x14 #9 0xffffffff80a2a382 at xenwatch_thread+0x182 #10 0xffffffff80b34164 at fork_exit+0x84 #11 0xffffffff8101ec9e at fork_trampoline+0xe Reported by: Nathan Friess <nathan.friess@gmail.com> Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=334145
* dev/xenstore: prevent transaction hijackingRoger Pau Monné2018-05-241-6/+22
| | | | | | | | | | | | | | The user-space xenstore device is currently lacking a check to make sure that the caller is only using transaction ids currently assigned to it. This allows users of the xenstore device to hijack transactions not started by them, although the scope is limited to transactions started by the same domain. Tested by: Nathan Friess <nathan.friess@gmail.com> Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=334144
* dev/xenstore: add support for watchesRoger Pau Monné2018-05-241-20/+248
| | | | | | | | | | | | | | Allow user-space applications to register watches using the xenstore device. This is needed in order to run toolstack operations on domains different than the one where xenstore is running (in which case the device is not used, since the connection to xenstore is done using a plain socket). Tested by: Nathan Friess <nathan.friess@gmail.com> Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=334142
* xenstore: don't wait with the PCATCH flagRoger Pau Monné2018-05-241-2/+2
| | | | | | | | | | | | | | | | | Due to the current synchronous xenstore implementation in FreeBSD, we cannot return from xs_read_reply without reading a reply, or else the ring gets out of sync and the next request will read the previous reply and crash due to the type mismatch. A proper solution involves making use of the req_id field in the message and allowing multiple in-flight messages at the same time on the ring. Remove the PCATCH flag so that signals don't interrupt the wait. Tested by: Nathan Friess <nathan.friess@gmail.com> Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=334141
* xenstore: remove the suspend sx lockRoger Pau Monné2018-05-241-77/+4
| | | | | | | | | | | | | | | | | There's no need to prevent suspend while doing xenstore transactions, callers of transactions are supposed to be prepared for a transaction to fail. This fixes a bug that could be triggered from the xenstore user-space device, since starting a transaction from user-space would result in returning there with a sx lock held, that causes a WITNESS check to trigger. Tested by: Nathan Friess <nathan.friess@gmail.com> Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=334140
* xen-blkback: do not use state 3 (XenbusStateInitialised)Roger Pau Monné2018-05-221-6/+13
| | | | | | | | | | | | | | | | | | | Linux will not connect to a backend that's in state 3 (XenbusStateInitialised), it needs to be in state 2 (XenbusStateInitWait) for Linux to attempt to connect to the backend. The protocol seems to suggest that the backend should indeed wait in state 2 for the frontend to connect, which makes state 3 unusable for disk backends. Also make sure blkback will connect to the frontend if the frontend reaches state 3 (XenbusStateInitialised) before blkback has processed the results from the hotplug script (Submitted by Nathan Friess). MFC after: 1 week Notes: svn path=/head/; revision=334027
* ifnet: Replace if_addr_lock rwlock with epoch + mutexMatt Macy2018-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366 Notes: svn path=/head/; revision=333813
* xen: fix gntdevRoger Pau Monné2018-05-021-7/+18
| | | | | | | | | | | | | | | | | | | | | | | Current interface to the gntdev in FreeBSD is wrong, and mostly worked out of luck before the PTI FreeBSD fixes, when kernel and user-space where sharing the same page tables. On FreeBSD ioctls have the size of the passed struct encoded in the ioctl number, because the generic ioctl handler in the OS takes care of copying the data from user-space to kernel space, and then calls the device specific ioctl handler. Thus using ioctl structs with variable sizes is not possible. The fix is to turn the array of structs at the end of ioctl_gntdev_alloc_gref and ioctl_gntdev_map_grant_ref into pointers, that can be properly accessed from the kernel gntdev driver using the copyin/copyout functions. Note that this is exactly how it's done for the privcmd driver. Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=333169
* Correct pseudo misspelling in sys/ commentsEd Maste2018-02-232-4/+4
| | | | | | | contrib code and #define in intel_ata.h unchanged. Notes: svn path=/head/; revision=329873