| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D47675
|
|
|
|
|
| |
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D47674
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have been incorrectly choosing the "hlt" idle method on modern AMD
EPYC servers for C1 idle. This is because AMD also uses the Functional
Fixed Hardware interface. Due to not parsing the table properly for
AMD, and due to a weird quirk where the mwait latency for C1 is
mis-interpreted as the latency for hlt, we wind up choosing hlt for
c1, which has a far higher wake up latency (similar to IO) of roughly
400us on my test system (AMD 7502P).
This patch fixes this by:
- Looking for AMD in addition to Intel in the FFH
(Note the vendor id of "2" for AMD is not publically documented, but
AMD has confirmed they are using "2" and has promised to document it.)
- Using mwait on AMD when specified in the table, and when CPUid says
its supported
- Fixing a weird issue where we copy the contents of cx_ptr for C1 and
when moving to C2, we do not reinitialize cx_ptr. This leads to
mwait being selected, and ignoring the specified i/o halt method
unless we clear mwait before looking at the table for C2.
Differential Revision: https://reviews.freebsd.org/D47444
Reviewed by: dab, kib, vangyzen
Sponsored by: Netflix
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of setting and clearing BUS_MASTER_RLD register on every C3
state enter/exit, set it only once if the system supports C3 state
and we are going to "disable" bus master arbitration while in it.
This is what Linux does for the past 14 years, and for even more time
this register is not implemented in a relevant hardware. Same time
since this is only a single bit in a bigger register, ACPI has to
do take a global lock and do read-modify-write for it, that is too
expensive, saved only by C3 not entered frequently, but enough to be
seen in idle system CPU profiles.
MFC after: 1 month
|
|
|
|
| |
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
|
|
|
|
|
| |
Reported by: Henrix
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D37081
|
|
|
|
|
| |
While CPPC is available on arm64 platforms with ACPI we don't know if we
need to work around issues with firmware there.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Buggy SMM implementations can hang while processing CPPC notifications.
This leads to some laptops (notably Thinkpads) hanging when the
hwpstate_intel driver is loaded.
Tell the SMM that we will handle CPPC notifications as described in:
- Intel® Processor Vendor-Specific ACPI
- Intel® 64 and IA-32 Architectures Software Developer’s Manual
CPPC events default to masked (disabled) so while we do not do any
handling right now this does not seem to lead to any issues.
This approach was found via this Linux Kernel patch:
https://lkml.org/lkml/2016/3/17/563
PR: 253288
Reviewed by: imp, jhb
Sponsored by: Modirum
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D36699
|
| |
|
|
|
|
|
| |
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D34988
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When FreeBSD is running as dom0 (initial domain) on a Xen system it
has access to the native ACPI tables and is the OSPM. However the
hypervisor is the entity in charge of the CPU idle and frequency
states, and in order to perform this duty it requires information
found the ACPI dynamic tables that can only be parsed by the OSPM.
Introduce a new Xen specific ACPI driver to fetch the Processor
related information and upload it to Xen. Note that this driver needs
to take precedence over the generic ACPI CPU driver when running as
dom0, so downgrade the probe score of the native driver to
BUS_PROBE_DEFAULT in order for the Xen specific driver to use
BUS_PROBE_SPECIFIC.
Tested on an Intel NUC to successfully parse and upload both the Cx and
Px states to Xen.
Sponsored by: Citrix Systems R&D
Reviewed by: jhb kib
Differential revision: https://reviews.freebsd.org/D34841
|
|
|
|
| |
MFC after: 2 weeks
|
| |
|
|
|
|
|
|
|
|
| |
In the acpi_cpu_postattach SYSINIT function cpu_softc may be NULL, e.g.
on arm64 when booting from FDT. Check it is not NULL at the start of
the function so we don't try to dereference a NULL pointer.
Sponsored by: The FreeBSD Foundation
|
|
|
|
| |
While there, remove couple unneeded global variables.
|
|
|
|
|
|
|
|
|
|
|
| |
There are already APIC ID, ACPI ID and OS ID for each CPU. In perfect
world all of those may match, but at least for SuperMicro server boards
none of them do. Plus none of them match the CPU devices listing order
by ACPI. Previous code used the ACPI device listing order to number
cpuX devices. It looked nice from NewBus perspective, but introduced
4th different set of IDs. Extremely confusing one, since in some places
the device unit numbers were treated as OS CPU IDs (coretemp), but not
in others (sysctl dev.cpu.X.%location).
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't need the result before next sleep time, so no reason to
additionally increase interrupt latency.
While there, remove extra PM ticks to microseconds conversion, making
C2/C3 sleep times look 4 times smaller than really. The conversion
is already done by AcpiGetTimerDuration(). Now I see reported sleep
times up to 0.5s, just as expected for planned 2 wakeups per second.
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we enter C2+ state via memory read, it may take chipset some
time to stop CPU. Extra register read covers that time. But MWAIT
makes CPU stop immediately, so we don't need to waste time after
wakeup with interrupts still disabled, increasing latency.
On my system it reduces ping localhost latency, waking up all CPUs
once a second, from 277us to 242us.
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Even though the information is very limited, it seems the intent of
this flag is to control ACPI_BITREG_BUS_MASTER_STATUS use for C3,
not force ACPI_BITREG_ARB_DISABLE manipulations for C2, where it was
never needed, and which register not really doing anything for years.
It wasted lots of CPU time on congested global ACPI hardware lock
when many CPU cores were trying to enter/exit deep C-states same time.
On idle 80-core system it pushed ping localhost latency up to 20ms,
since badport_bandlim() via counter_ratecheck() wakes up all CPUs
same time once a second just to synchronously reset the counters.
Now enabling C-states increases the latency from 0.1 to just 0.25ms.
Discussed with: kib
MFC after: 1 month
|
|
|
|
| |
Notes:
svn path=/head/; revision=365096
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Notes:
svn path=/head/; revision=358333
|
|
|
|
|
|
|
|
|
|
| |
when _CID match.
Reviewed by: jhb, imp
Differential Revision:https://reviews.freebsd.org/D16468
Notes:
svn path=/head/; revision=339754
|
|
|
|
|
|
|
| |
Have cpu0 be noisy, but all the other CPU devices be quiet on boot.
Notes:
svn path=/head/; revision=333334
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By the ACPI standard (ACPI 5 chapter 8.4 Declaring Processors) Processors
can be implemented in 2 distinct ways:
* Through a Processor object type (which provides P_BLK)
* Through a Device object type
Prior to this change, the FreeBSD driver only supported the former. AMD
Epyc / Poweredge systems we are testing both implement the latter only. Add
the missing support.
Because P_BLK is not defined in the device object case, C-states entering
must be completely controlled via _CST methods rather than P_LVL2/3.
John Baldwin points out that ACPI 6.0 formally deprecates the Processor
keyword, so eventually processors will only be enumerated as Device objects.
Submitted by: attilio
Reviewed by: jhb, markj, Anton Rang <rang AT acm.org>
Relnotes: maybe
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13457
Notes:
svn path=/head/; revision=326956
|
|
|
|
| |
Notes:
svn path=/head/; revision=324502
|
|
|
|
|
|
|
|
| |
Reported by: lwhsu, kib
Requested by: ngie
Notes:
svn path=/head/; revision=324136
|
|
|
|
| |
Notes:
svn path=/head/; revision=324109
|
|
|
|
| |
Notes:
svn path=/head/; revision=323076
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MFC will include a compat definition of smp_no_rendevous_barrier()
that calls smp_no_rendezvous_barrier().
Reviewed by: gnn, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10313
Notes:
svn path=/head/; revision=316648
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Core2 and older Intel CPUs, where TSC stops in C2, system does not
allow C2 entrance if timecounter hardware is TSC. This is done by
tc_windup() which tests for TC_FLAGS_C2STOP flag of the new
timecounter and increases cpu_disable_c2_sleep if flag is set. Right
now init_TSC_tc() only sets the flag if cpu_deepest_sleep >= 2, but
TSC is initialized too early for this variable to be set by
acpi_cpu.c.
There is no reason to require that ACPI reported C2 and deeper states
to set TC_FLAGS_C2STOP, so remove cpu_deepest_sleep test from
init_TSC_tc() condition. And since this is the only use of the
variable, remove it at all.
Reported and submitted by: Jia-Shiun Li <jiashiun@gmail.com>
Suggested by: jhb
MFC after: 2 weeks
Notes:
svn path=/head/; revision=314211
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
using the ACPI C1/mwait sleep method.
Previously, the mwait instruction would return when an interrupt was
pending; however, the idle loop did not actually enable interrupts when
this occurred. This led to a situation where the idle loop could quickly
spin through the C1/mwait sleep method a number of times when an interrupt
was pending. (Eventually, the situation corrected itself when something
other than an interrupt triggered the idle loop to either enable interrupts
or schedule another thread.)
Reviewed by: kib, imp (earlier version)
Input from: jhb
MFC after: 1 week
Sponsored by: Netflix
Notes:
svn path=/head/; revision=313447
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.
This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed). This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP. It also permits all CPUs to be available for
handling interrupts before any devices are probed.
This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot. Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.
However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system. In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU. Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.
Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code. This removes the need to treat the single-CPU boot environment
as a special case.
As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP). This will allow the option to be turned off
if need be during initial testing. I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0. Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.
These changes have only been tested on x86. Other platform maintainers
are encouraged to port their architectures over as well. The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).
PR: kern/199321
Reviewed by: markj, gnn, kib
Sponsored by: Netflix
Notes:
svn path=/head/; revision=299746
|
|
|
|
|
|
|
| |
Most affect comments, very few have user-visible effects.
Notes:
svn path=/head/; revision=298955
|
|
|
|
|
|
|
|
|
|
|
| |
Arguably we should only be doing the probe/attach to children of
these devices as well.
Tested by: Michal Stanek <mst_semihalf.com> (arm64)
Differential Revision: https://reviews.freebsd.org/D6133
Notes:
svn path=/head/; revision=298754
|
|
|
|
|
|
|
|
|
|
|
| |
Both of the callers were expecting the input cap_set to be modified.
This fixes them to request cap_set to be updated with the returned buffer.
Reviewed by: jkim
Differential Revision: https://reviews.freebsd.org/D6040
Notes:
svn path=/head/; revision=298484
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Eventually with earlier AP startup this code will change to call the
startup function synchronously instead of queueing the task. Moving
the time we queue the task should be a no-op since taskqueue threads
don't start executing tasks until much later, but this reduces the diff
with the earlier AP startup patches.
Sponsored by: Netflix
Notes:
svn path=/head/; revision=298425
|
|
|
|
| |
Notes:
svn path=/head/; revision=298379
|
|
|
|
|
|
|
| |
return buffer (yet).
Notes:
svn path=/head/; revision=298377
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This wrapper does not translate errors in the first word to ACPI
error status returns. Use this wrapper in the acpi_cpu(4) driver in
place of the existing _OSC code. While here, fix a bug where the wrong
count of words was passed when invoking _OSC.
Reviewed by: jkim
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6022
Notes:
svn path=/head/; revision=298370
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
drivers, one for fdt, one for acpi. It then uses this to decide if it will
use fdt or acpi.
The GICv2 (interrupt controller) and Generic Timer drivers have been
updated to handle both cases.
As this is early code we still need FDT to find the kernel console, and
some parts are still missing, including PCI support.
Differential Revision: https://reviews.freebsd.org/D2463
Reviewed by: jhb, jkim, emaste
Obtained from: ABT Systems Ltd
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Notes:
svn path=/head/; revision=284273
|
|
|
|
|
|
|
|
| |
Reported by: Coverity
CID: 1306132
Notes:
svn path=/head/; revision=284195
|
|
|
|
|
|
|
|
|
|
| |
bridges only supported Intel Pentium and Pentium II era processors and there
is no reason for hardware virtualizations to emulate these quirks.
MFC after: 1 week
Notes:
svn path=/head/; revision=283261
|
|
|
|
| |
Notes:
svn path=/head/; revision=282771
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
interacts with interrupts, query ACPI and use MWAIT for entrance into
Cx sleep states. Support C1 "I/O then halt" mode. See Intel'
document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface
Specification" for description.
Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use
it instead of inlining "sti; hlt" sequence in several places.
In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods
sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported}
sysctls.
Both jkim and avg have some other patches implementing the mwait
functionality; this work is unrelated. Linux does not rely on the
ACPI to provide correct tables describing Cx modes. Instead, the
driver has pre-defined knowledge of the CPU models, it was supplied by
Intel.
Tested by: pho (previous versions)
Sponsored by: The FreeBSD Foundation
Notes:
svn path=/head/; revision=282678
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
accidentally enable non-existent states.
This bug was triggered if ACPI advertises the presence of a C2 state
which we fail to parse via acpi_PkgGas due to our lack of support for
FFixedHW resources, and causes an immediate panic when an attempt is
made to enter the (NULL) state.
One affected platform is the EC2 c4.8xlarge VM instance type; there
may be others.
MFC after: 1 week
Thanks to: jkim, @_msw_
Notes:
svn path=/head/; revision=277318
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
may also halt in C2 and not just C3 (it seems that in some cases the BIOS
advertises its C3 state as a C2 state in _CST). Just play it safe and
disable both C2 and C3 states if a user forces the use of the TSC as the
timecounter on such CPUs.
PR: 192316
Differential Revision: https://reviews.freebsd.org/D1441
No objection from: jkim
MFC after: 1 week
Notes:
svn path=/head/; revision=276724
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also disable a couple of ACPI devices that are not usable under Dom0.
To this end a couple of booleans are added that allow disabling ACPI
specific devices.
Sponsored by: Citrix Systems R&D
Reviewed by: jhb
x86/xen/xen_nexus.c:
- Return BUS_PROBE_SPECIFIC in the Xen Nexus attachement routine to
force the usage of the Xen Nexus.
- Attach the ACPI bus when running as Dom0.
dev/acpica/acpi_cpu.c:
dev/acpica/acpi_hpet.c:
dev/acpica/acpi_timer.c
- Add a variable that gates the addition of the devices.
x86/include/init.h:
- Declare variables that control the attachment of ACPI cpu, hpet and
timer devices.
Notes:
svn path=/head/; revision=269515
|
|
|
|
| |
Notes:
svn path=/head/; revision=267992
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
Notes:
svn path=/head/; revision=267985
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
Notes:
svn path=/head/; revision=267961
|