| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new PRIV_VMM_CREATE and DESTROY permissions should be allowed by
jails, so need to be added to the list in prison_priv_check(). Then,
modify vmmdev_create() to verify that the jail was created with the
allow.vmm flag. This is already verified when opening /dev/vmmctl, but
checking again doesn't hurt and ensures that one can't pass the
allow.vmm policy by passing a vmmctl fd along a unix domain socket from
outside the jail.
Rename vmm_priv_check() to vmm_jail_priv_check() to make the function's
purpose more clear.
Reported by: novel
Reviewed by: bnovkov
Fixes: d4c05edd410e ("vmm: Add privilege checks to vmmctl operations")
Differential Revision: https://reviews.freebsd.org/D56119
|
| |
|
|
|
|
|
|
|
|
|
| |
For now, just describe the error where an unprivileged user attempts to
run a VM without DESTROY_ON_CLOSE semantics, i.e., monitor mode.
Reviewed by: bnovkov
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D54743
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add the vmm group.
- Let /dev/vmmctl belong to the vmm group by default, and give group
write permissions.
- When creating a VM's device files, make them owned by the creating
process' effective UID.
Reviewed by: bnovkov
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D54741
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for supporting creation of VMs by unprivileged users, add
some restrictions:
- Disallow creation of non-transient VMs by unprivileged users. That
is, if an unprivileged user creates a VM, the VM must be destroyed
automatically once the last fd referencing it is gone.
- Disallow destroying VMs created by a different user, unless the caller
has the PRIV_VMM_DESTROY privilege.
Reviewed by: bnovkov
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D54740
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After commit e11768e94787 ("vmm: Add PRIV_DRIVER checks for passthru
ioctls"), it is not possible to use PCI passthru from jails, as
PRIV_DRIVER is not granted to jails. Apparently some users expect this
to work, understanding that jailing bhyve provides little security
benefit in this configuration.
I believe we should disable ppt access in jails even when allow.vmm is
configured. To provide an escape hatch for users, add a new
allow.vmm_ppt jail configuration knob, and check it when handling ppt
ioctls in jails. Also add a new PRIV_VMM_PPTDEV to replace the use of
PRIV_DRIVER.
PR: 292750
Reviewed by: corvink
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D55066
|
| |
|
|
|
|
|
|
|
| |
sc->vm is unconditionally dereferenced earlier in this function. No
functional change intended.
Reviewed by: bnovkov
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D55069
|
| |
|
|
|
|
|
|
|
|
|
|
| |
vmmdev_create() increments the VM count as its last step and calls
vmmdev_destroy() if it fails. However, vmmdev_destroy() unconditionally
decrements the count.
Correct this bug by reordering operations.
Fixes: 1092ec8b3375 ("kern: Introduce RLIMIT_VMM")
Reviewed by: bnovkov
Differential Revision: https://reviews.freebsd.org/D55068
|
| |
|
|
|
|
|
| |
Reported by: novel
Reviewed by: bnovkov
Fixes: e758074458df ("vmm: Move the module load handler to vmm_dev.c")
Differential Revision: https://reviews.freebsd.org/D54750
|
| |
|
|
|
|
|
|
|
| |
Required when KTR is configured.
Remove the pcpu.h include while here, as it seems to be unneeded.
Reported by: Jenkins
Fixes: 5f13d6b60740 ("vmm: Move common accessors and vm_eventinfo into sys/dev/vmm")
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that struct vm and struct vcpu are defined in headers, provide
inline accessors. We could just remove the accessors outright, but they
don't hurt and it would result in unneeded churn.
As a part of this, consolidate definitions related to struct
vm_eventinfo as well. I'm not sure if struct vm_eventinfo is really
needed anymore, now that vmmops_run implementations can directly access
vm and vcpu fields, but this can be resolved later.
No functional change intended.
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D53586
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the machine-independent fields of struct vm and struct vcpu are
available in a header, we can move lots of duplicated code into
sys/dev/vmm/vmm_vm.c. This change does exactly that.
No functional change intended.
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D53585
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is quite a lot of duplication of code between amd64, arm64 and
riscv with respect to VM and vCPU state management. This is a bit
tricky to resolve since struct vm and struct vcpu are private to vmm.c
and both structures contain a mix of machine-dependent and
machine-independent fields.
To allow deduplication without also introducing a lot of churn, follow
the approach of struct pcpu and 1) lift the definitions of those
structures into a new header, sys/dev/vmm/vmm_vm.h, and 2) define
machine-dependent macros, VMM_VM_MD_FIELDS and VMM_VCPU_MD_FIELDS which
lay out the machine-dependent fields.
One disadvantage of this approach is that the two structures are no
longer private to vmm.c, but I think this is acceptable.
No functional change intended. A follow-up change will move a good deal
of machine/vmm/vmm.c into sys/dev/vmm/vmm_vm.c.
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D53584
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds the ability to tie a virtual machine's lifecycle to
a /dev/vmmctl file descriptor. A user can request `vmmctl` to destroy a
virtual machine on close using the `VMMCTL_CREATE_DESTROY_ON_CLOSE` flag
when creating the virtual machine. `vmmctl` tracks such virtual machines
in per-descriptor lists.
Differential Revision: https://reviews.freebsd.org/D53729
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
MFC after: 3 months
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This change introduces a new per-UID limit for controlling the
number of vmm instances, in anticipation of unprivileged bhyve.
This allows ut to limit the amount of kernel memory allocated
by the vmm driver and prevent potential memory exhaustion attacks.
Differential Revision: https://reviews.freebsd.org/D53728
Reviewed by: markj, olce, corvink
MFC after: 3 months
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
|
| |
|
|
|
|
|
|
|
|
| |
No functional change intended.
Reviewed by: corvink
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D53477
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vm_create() is only called from one place. Rather than having similar
checks everywhere, move them to vmmdev_create().
We can safely assume that the name is nul-terminated, the vmmctl ioctl
handler and the legacy sysctl handler ensure this. So, don't bother
with strnlen().
Finally, make sure that the name buffers are the same size on all
platforms. VM_MAX_NAMELEN is supposed to be the maximum, not including
the nul terminator.
Reviewed by: corvink
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D53422
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the vmm_initialized check out of vm_create() and into the legacy
sysctl handler. If vmm_initialized is false, /dev/vmmctl will not be
available and so cannot be used to create VMs.
Introduce new MD vmm_modinit() and vmm_modcleanup() routines which
handle MD (de)initialization.
No functional change intended.
Reviewed by: corvink
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D53421
|
| |
|
|
|
|
|
|
|
|
|
| |
We can free the mask earlier, simplifying some error paths. No
functional change intended.
Reviewed by: corvink, jhb, emaste
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D53418
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In vm_mmap_memseg(), use vm_map_insert() instead of vm_map_find().
Existing callers expect to map the GPA that they passed, whereas
vm_map_find() merely treats the GPA as a hint. Also check for overflow
and remove a test for first < 0 since "first" is unsigned.
In vmm_mmio_alloc(), return an error number instead of an object
pointer, since the sole caller doesn't need the pointer. As in
vm_mmap_memseg(), use vm_map_insert() instead of vm_map_find() and
validate parameters. This function is not directly reachable via
ioctl(), but we ought to be careful anyway.
Reviewed by: corvink, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D53246
|
| |
|
|
|
|
|
|
|
|
|
| |
In preparation for allowing non-root users to create and access bhyve
VMs, add privilege checks for ioctls which operate on passthru devices.
Reviewed by: corvink
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D53144
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On non-amd64 platforms, check for negative register indices. This isn't
required today since we match against individual register indices, but
we might as well check it. On amd64, add a comment explaining why we
permit negative register indices.
Use mallocarray() for allocating register arrays in the ioctl layer.
No functional change intended.
Reviewed by: corvink
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D53143
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vm_smp_rendezvous() invokes a callback on all vCPUs, blocking the
initiator until all vCPUs have responded. vcpu_lock_all() blocks each
vCPU by waiting for it to go idle and setting the vCPU state to frozen.
These two operations can deadlock on each other, particularly when
booting a Windows guest, when vcpu_lock_all() blocks waiting for a
rendezvous initiator, and the initiator is blocked waiting for the vCPU
thread which called vcpu_lock_all() to invoke the rendezvous callback.
Implement vcpu_lock_all() in a way that avoids deadlocks with
vm_smp_rendezvous(). In particular, when traversing vCPUs, invoke the
rendezvous callback on the vCPU's behalf to help the initiator finish.
We can only safely do so when the vCPU is IDLE or we have already locked
it, otherwise we may be racing with the target vCPU thread. Thus:
- Use an exclusive lock to serialize vcpu_lock_all() callers, which lets
us lock vCPUs out of order without fear of deadlock with parallel
vcpu_lock_all() callers.
- If a rendezvous is pending, lock all idle vCPUs and invoke the
callback on their behalf. If the vcpu_lock_all() caller is itself a
vCPU thread, this will handle that thread.
- Block waiting for all non-idle vCPUs to idle, or until one of them
initiates a rendezvous, in which case we go back and invoke callbacks
on behalf of already-locked vCPUs.
Note that on !amd64 no changes are needed since there is no rendezvous
mechanism, so there is a separate vcpu_set_state_all() for them based on
the previous vcpu_lock_all(). These will be merged together once vcpu
state handling is consolidated into sys/dev/vmm.
Reviewed by: corvink (previous version)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D52968
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This further consolidates handling of guest memory into MI code in
sys/dev/vmm.
No functional change intended.
Reviewed by: corvink
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D53012
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise we don't do anything to kick vcpu threads out of a sleep
state when destroying a VM. For instance, suppose a guest executes hlt
on amd64 or wfi on arm64 with interrupts disabled. Then,
bhyvectl --destroy will hang until the vcpu thread somehow comes out of
vm_handle_hlt()/vm_handle_wfi() since destroy_dev() is waiting for vCPU
threads to drain.
Note that on amd64, if hw.vmm.halt_detection is set to 1 (the default),
the guest will automatically exit in this case since it's treated as a
shutdown. But, the above should not hang if halt_detection is set to 0.
Here, vm_suspend() wakes up vcpu threads, and a subsequent attempt to
run the vCPU will result in an error which gets propagated to userspace,
allowing destroy_dev() to proceed.
Add a new suspend code for this purpose. Modify bhyve to exit with
status 4 ("exited due to an error") when it's received, since that's
what'll happen generally when the VM is destroyed asynchronously.
Reported by: def
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D51761
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds the necessary kernelspace bits required for
supporting NUMA domains in bhyve VMs.
The layout of system memory segments and how they're created has
been reworked. Each guest NUMA domain will now have its own memory
segment. Furthermore, this change allows users to tweak the domain's
backing vm_object domainset(9) policy.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D44565
|
| |
|
|
|
|
|
|
|
| |
Add sysctl descriptions, and remove surprising default text.
PR: 288437
Reviewed by: jhb, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D51533
|
| |
|
|
|
| |
Reported by: acm
Fixes: b9ef152bec6c ("vmm: Merge vmm_dev.c")
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On all three platforms supported by vmm, we have mostly duplicated code
to manage guest physical memory regions. Deduplicate much of this code
and move it into sys/dev/vmm/vmm_mem.c.
To avoid exporting struct vm outside of machdep vmm.c, add a new
struct vm_mem to contain the memory segment descriptors, and add a
vm_mem() accessor, akin to vm_vmspace(). This way vmm_mem.c can
implement its routines without needing to see the layout of struct vm.
The handling of the per-VM vmspace is also duplicated but will be moved
to vmm_mem.c in a follow-up patch.
On amd64, move the ppt_is_mmio() check out of vm_mem_allocated() to keep
the code MI, as PPT is only implemented on amd64. There are only a
couple of callers, so this is not unreasonable.
No functional change intended.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D48270
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In commit a97f683fe3c4 I didn't add code to remove the vmmctl device
when vmm.ko is unloaded, so it would persist and prevent vmm.ko from
being re-loaded.
Extend vmmdev_cleanup() to destroy the vmmctl cdev. Also call
vmmdev_cleanup() if vmm_init() fails.
Reviewed by: corvink, andrew
Fixes: a97f683fe3c4 ("vmm: Add a device file interface for creating and destroying VMs")
Differential Revision: https://reviews.freebsd.org/D48269
|
| |
|
|
|
|
|
|
| |
CID: 1568045
Reported by: Coverity Scan
Reviewed by: markj
Fixes: 4008758105a6 vmm: Validate credentials when opening a vmmdev
Differential Revision: https://reviews.freebsd.org/D48073
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This supersedes the sysctl interface, which has the limitations of being
root-only and not supporting automatic resource destruction, i.e., we
cannot easily destroy VMs automatically when bhyve terminates.
For now, two ioctls are implemented VMMCTL_VM_CREATE and
VMMCTL_VM_DESTROY. Eventually I would like to support tying a VM's
lifetime to that of the descriptor, so that it is automatically
destroyed when the descriptor is closed. However, this will require
some work in bhyve: when the guest wants to reboot, bhyve exits with a
status that indicates that it is to be restarted. This is incompatible
with the idea of tying a VM's lifetime to that of a descriptor, since we
want to avoid creating and destroying a VM across each reboot (as this
involves freeing all of the guest memory, among other things). One
possible design would be to decompose bhyve into two processes, a parent
which handles reboots, and a child which runs in capability mode and
handles guest execution.
In any case, this gets us closer to addressing the shortcomings
mentioned above.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D47028
|
| |
|
|
|
|
|
| |
The softc pointer is now unused, just remove it.
Reported by: se
Fixes: 66fc442421f8 ("vmm: Remove an incorrect credential check in vmmdev_open()")
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Checking pointer equality here is too strict and can lead to incorrect
errors, as credentials are frequently copied to avoid reference counting
overhead.
The check is new with commit 4008758105a6 and was added with the goal of
allowing non-root users to create VMs in mind. Just remove it for now.
Reported by: Alonso Cárdenas Márquez <acardenas@bsd-peru.org>
Reviewed by: jhb
Fixes: 4008758105a6 ("vmm: Validate credentials when opening a vmmdev")
Differential Revision: https://reviews.freebsd.org/D46535
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This avoids creating windows where a device file is accessible but the
device-specific field is not set.
Now that vmmdev_mtx is a sleepable lock, avoid dropping it while
creating devices files. This makes it easier to handle races and
simplifies some code; for example, the VSC_LINKED flag is no longer
needed.
Suggested by: jhb
Reviewed by: imp, jhb
Differential Revision: https://reviews.freebsd.org/D46488
|
| |
|
|
|
|
|
|
| |
This will make it easier to atomically create the device file and set
its si_drv1 member.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D46487
|
| |
|
|
|
|
|
|
|
|
|
| |
Rather than performing privilege checks after a specific VM's device
file is opened, do it once at the time the device file is opened. This
means that one can continue to access a VM via its device fd after
attaching to a jail which does not have vmm enabled, but this seems like
a reasonable semantic to have anyway.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D46486
|
| |
|
|
|
|
|
|
|
|
| |
For compat ioctls and structures, we use a mix of suffixes: _old,
_fbsd<version>, _<version>. Standardize on _<version> to make things
more consistent. No functional change intended.
Reported by: jhb
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46449
|
| |
|
|
|
|
|
|
| |
Otherwise they are globally visible (in jails with allow.vmm set),
instead of being restricted to the jail to which the VM belongs.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46448
|
| |
|
|
|
|
|
|
|
|
| |
vmmdev_lookup() is used from sysctl context to find a VM by name.
There, a reference credential is already passed, so use that instead of
assuming that it's the same as curthread->td_ucred, even though that's
true today. No functional change intended.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46447
|
| |
|
|
|
|
|
|
|
|
| |
The sole caller of this function already holds a pointer to the VM's
softc, so rather than passing the VM name and looking it up again, just
pass the softc pointer directly. This function is only called from an
ioctl context, so the softc structure will remain live.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46446
|
| |
|
|
|
|
|
|
| |
This will make it easy to share code with an ioctl handler which creates
VMs. No functional change intended.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46445
|
| |
|
|
|
|
|
|
| |
This will make it easy to share code with an ioctl handler which creates
VMs. No functional change intended.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46444
|
| |
|
|
|
|
|
|
| |
There is no reason to keep them in vmm_dev.h. No functional change
intended.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46432
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This file contains the vmm device file implementation. Most of this
code is not machine-dependent and so shouldn't be duplicated this way.
Move most of it into a generic dev/vmm/vmm_dev.c. This will make it
easier to introduce a cdev-based interface for VM creation, which in
turn makes it possible to implement support for running bhyve as an
unprivileged user.
Machine-dependent ioctls continue to be handled in machine-dependent
code. To make the split a bit easier to handle, introduce a pair of
tables which define MI and MD ioctls. Each table entry can set flags
which determine which locks need to be held in order to execute the
handler. vmmdev_ioctl() now looks up the ioctl in one of the tables,
acquires locks and either handles the ioctl directly or calls
vmmdev_machdep_ioctl() to handle it.
No functional change intended. There is a lot of churn in this change
but the underlying logic in the ioctl handlers is the same. For now,
vmm_dev.h is still mostly separate, even though some parts could be
merged in principle. This would involve changing include paths for
userspace, though.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46431
|
| |
|
|
|
|
|
|
|
|
|
|
| |
There is a small difference between the arm64 and amd64 implementations:
the latter makes use of a "scope" to exclude AMD-specific stats on Intel
systems and vice-versa. Replace this with a more generic predicate
callback which can be used for the same purpose.
No functional change intended.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46430
|
|
|
No functional change intended.
Reviewed by: corvink, jhb, emaste
Differential Revision: https://reviews.freebsd.org/D46429
|