aboutsummaryrefslogtreecommitdiff
path: root/lib/libpmc/pmc.3
diff options
context:
space:
mode:
authorJoseph Koshy <jkoshy@FreeBSD.org>2008-09-16 16:58:24 +0000
committerJoseph Koshy <jkoshy@FreeBSD.org>2008-09-16 16:58:24 +0000
commit24f2c3f3940e9bde8177cb77f515aba459dd608a (patch)
treeb0008abce18358b0f2c6b7242b73ca32858b1860 /lib/libpmc/pmc.3
parentd41debca9326a4bbc6c0db0b860896b55916a5f5 (diff)
downloadsrc-24f2c3f3940e9bde8177cb77f515aba459dd608a.tar.gz
src-24f2c3f3940e9bde8177cb77f515aba459dd608a.zip
Notes
Diffstat (limited to 'lib/libpmc/pmc.3')
-rw-r--r--lib/libpmc/pmc.33125
1 files changed, 25 insertions, 3100 deletions
diff --git a/lib/libpmc/pmc.3 b/lib/libpmc/pmc.3
index a779cbc9a322..e7c2683283e2 100644
--- a/lib/libpmc/pmc.3
+++ b/lib/libpmc/pmc.3
@@ -23,7 +23,7 @@
.\"
.\" $FreeBSD$
.\"
-.Dd March 14, 2008
+.Dd September 16, 2008
.Os
.Dt PMC 3
.Sh NAME
@@ -163,8 +163,6 @@ PMC supported by this library are named by the
enumeration.
Supported PMC kinds include:
.Bl -tag -width PMC_CLASS_TSC -compact
-.It PMC_CLASS_TSC
-The timestamp counter on i386 and amd64 architecture CPUs.
.It PMC_CLASS_K7
Programmable hardware counters present in
.Tn "AMD Athlon"
@@ -173,6 +171,10 @@ CPUs.
Programmable hardware counters present in
.Tn "AMD Athlon64"
CPUs.
+.It PMC_CLASS_P4
+Programmable hardware counters present in
+.Tn "Intel Pentium 4"
+CPUs.
.It PMC_CLASS_P5
Programmable hardware counters present in
.Tn Intel
@@ -188,10 +190,8 @@ Programmable hardware counters present in
and
.Tn "Pentium M"
CPUs.
-.It PMC_CLASS_P4
-Programmable hardware counters present in
-.Tn "Intel Pentium 4"
-CPUs.
+.It PMC_CLASS_TSC
+The timestamp counter on i386 and amd64 architecture CPUs.
.El
.Ss PMC Capabilities
.Pp
@@ -405,8 +405,7 @@ being probed.
Event names are PMC architecture dependent, but the PMC library defines
machine independent aliases for commonly used events.
.Ss Event Name Aliases
-Event name aliases are CPU architecture independent names for commonly
-used events.
+Event name aliases are PMC-independent names for commonly used events.
The following aliases are known to this version of the
.Nm pmc
library:
@@ -431,3099 +430,19 @@ Measure the number of interrupts seen.
Measure the number of cycles the processor is not in a halted
or sleep state.
.El
-.Ss Time Stamp Counter (TSC)
-The timestamp counter is a monotonically non-decreasing counter that
-counts processor cycles.
-.Pp
-In the i386 architecture, this counter may
-be selected by requesting an event with event specifier
-.Dq Li tsc .
-The
-.Dq Li tsc
-event does not support any further qualifiers.
-It can only be allocated in system-wide counting mode,
-and is a read-only counter.
-Multiple processes are allowed to allocate the TSC.
-Once allocated, it may be read using the
-.Fn pmc_read
-function, or by using the RDTSC instruction.
-.Ss AMD (K7) PMCs
-These PMCs are present in the
-.Tn "AMD Athlon"
-series of CPUs and are documented in:
-.Rs
-.%B "AMD Athlon Processor x86 Code Optimization Guide"
-.%N "Publication No. 22007"
-.%D "February 2002"
-.%Q "Advanced Micro Devices, Inc."
-.Re
-.Pp
-Event specifiers for AMD K7 PMCs can have the following optional
-qualifiers:
-.Bl -tag -width indent
-.It Li count= Ns Ar value
-Configure the counter to increment only if the number of configured
-events measured in a cycle is greater than or equal to
-.Ar value .
-.It Li edge
-Configure the counter to only count negated-to-asserted transitions
-of the conditions expressed by the other qualifiers.
-In other words, the counter will increment only once whenever a given
-condition becomes true, irrespective of the number of clocks during
-which the condition remains true.
-.It Li inv
-Invert the sense of comparision when the
-.Dq Li count
-qualifier is present, making the counter to increment when the
-number of events per cycle is less than the value specified by
-the
-.Dq Li count
-qualifier.
-.It Li os
-Configure the PMC to count events happening at privilege level 0.
-.It Li unitmask= Ns Ar mask
-This qualifier is used to further qualify a select few events,
-.Dq Li k7-dc-refills-from-l2 ,
-.Dq Li k7-dc-refills-from-system
-and
-.Dq Li k7-dc-writebacks .
-Here
-.Ar mask
-is a string of the following characters optionally separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li m
-Count operations for lines in the
-.Dq Modified
-state.
-.It Li o
-Count operations for lines in the
-.Dq Owner
-state.
-.It Li e
-Count operations for lines in the
-.Dq Exclusive
-state.
-.It Li s
-Count operations for lines in the
-.Dq Shared
-state.
-.It Li i
-Count operations for lines in the
-.Dq Invalid
-state.
-.El
-.Pp
-If no
-.Dq Li unitmask
-qualifier is specified, the default is to count events for caches
-lines in any of the above states.
-.It Li usr
-Configure the PMC to count events occurring at privilege levels 1, 2
-or 3.
-.El
-.Pp
-If neither of the
-.Dq Li os
-or
-.Dq Li usr
-qualifiers were specified, the default is to enable both.
-.Pp
-The event specifiers supported on AMD K7 PMCs are:
-.Bl -tag -width indent
-.It Li k7-dc-accesses
-Count data cache accesses.
-.It Li k7-dc-misses
-Count data cache misses.
-.It Li k7-dc-refills-from-l2 Op Li ,unitmask= Ns Ar mask
-Count data cache refills from L2 cache.
-This event may be further qualified using the
-.Dq Li unitmask
-qualifier.
-.It Li k7-dc-refills-from-system Op Li ,unitmask= Ns Ar mask
-Count data cache refills from system memory.
-This event may be further qualified using the
-.Dq Li unitmask
-qualifier.
-.It Li k7-dc-writebacks Op Li ,unitmask= Ns Ar mask
-Count data cache writebacks.
-This event may be further qualified using the
-.Dq Li unitmask
-qualifier.
-.It Li k7-l1-dtlb-miss-and-l2-dtlb-hits
-Count L1 DTLB misses and L2 DTLB hits.
-.It Li k7-l1-and-l2-dtlb-misses
-Count L1 and L2 DTLB misses.
-.It Li k7-misaligned-references
-Count misaligned data references.
-.It Li k7-ic-fetches
-Count instruction cache fetches.
-.It Li k7-ic-misses
-Count instruction cache misses.
-.It Li k7-l1-itlb-misses
-Count L1 ITLB misses that are L2 ITLB hits.
-.It Li k7-l1-l2-itlb-misses
-Count L1 (and L2) ITLB misses.
-.It Li k7-retired-instructions
-Count all retired instructions.
-.It Li k7-retired-ops
-Count retired ops.
-.It Li k7-retired-branches
-Count all retired branches (conditional, unconditional, exceptions
-and interrupts).
-.It Li k7-retired-branches-mispredicted
-Count all misprediced retired branches.
-.It Li k7-retired-taken-branches
-Count retired taken branches.
-.It Li k7-retired-taken-branches-mispredicted
-Count mispredicted taken branches that were retired.
-.It Li k7-retired-far-control-transfers
-Count retired far control transfers.
-.It Li k7-retired-resync-branches
-Count retired resync branches (non control transfer branches).
-.It Li k7-interrupts-masked-cycles
-Count the number of cycles when the processor's
-.Va IF
-flag was zero.
-.It Li k7-interrupts-masked-while-pending-cycles
-Count the number of cycles interrupts were masked while pending due
-to the processor's
-.Va IF
-flag being zero.
-.It Li k7-hardware-interrupts
-Count the number of taken hardware interrupts.
-.El
-.Ss AMD (K8) PMCs
-These PMCs are present in the
-.Tn "AMD Athlon64"
-and
-.Tn "AMD Opteron"
-series of CPUs.
-They are documented in:
-.Rs
-.%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors"
-.%N "Publication No. 26094"
-.%D "April 2004"
-.%Q "Advanced Micro Devices, Inc."
-.Re
-.Pp
-Event specifiers for AMD K8 PMCs can have the following optional
-qualifiers:
-.Bl -tag -width indent
-.It Li count= Ns Ar value
-Configure the counter to increment only if the number of configured
-events measured in a cycle is greater than or equal to
-.Ar value .
-.It Li edge
-Configure the counter to only count negated-to-asserted transitions
-of the conditions expressed by the other fields.
-In other words, the counter will increment only once whenever a given
-condition becomes true, irrespective of the number of clocks during
-which the condition remains true.
-.It Li inv
-Invert the sense of comparision when the
-.Dq Li count
-qualifier is present, making the counter to increment when the
-number of events per cycle is less than the value specified by
-the
-.Dq Li count
-qualifier.
-.It Li mask= Ns Ar qualifier
-Many event specifiers for AMD K8 PMCs need to be additionally
-qualified using a mask qualifier.
-These additional qualifiers are event-specific and are documented
-along with their associated event specifiers below.
-.It Li os
-Configure the PMC to count events happening at privilege level 0.
-.It Li usr
-Configure the PMC to count events occurring at privilege levels 1, 2
-or 3.
-.El
-.Pp
-If neither of the
-.Dq Li os
-or
-.Dq Li usr
-qualifiers were specified, the default is to enable both.
-.Pp
-The event specifiers supported on AMD K8 PMCs are:
-.Bl -tag -width indent
-.It Li k8-bu-cpu-clk-unhalted
-Count the number of clock cycles when the CPU is not in the HLT or
-STPCLK states.
-.It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier
-Count fill requests that missed in the L2 cache.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li dc-fill
-Count data cache fill requests.
-.It Li ic-fill
-Count instruction cache fill requests.
-.It Li tlb-reload
-Count TLB reloads.
-.El
-.Pp
-The default is to count all types of requests.
-.It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier
-Count internally generated requests to the L2 cache.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li cancelled
-Count cancelled requests.
-.It Li dc-fill
-Count data cache fill requests.
-.It Li ic-fill
-Count instruction cache fill requests.
-.It Li tag-snoop
-Count tag snoop requests.
-.It Li tlb-reload
-Count TLB reloads.
-.El
-.Pp
-The default is to count all types of requests.
-.It Li k8-dc-access
-Count data cache accesses including microcode scratchpad accesses.
-.It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier
-Count data cache copyback operations.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li exclusive
-Count operations for lines in the
-.Dq exclusive
-state.
-.It Li invalid
-Count operations for lines in the
-.Dq invalid
-state.
-.It Li modified
-Count operations for lines in the
-.Dq modified
-state.
-.It Li owner
-Count operations for lines in the
-.Dq owner
-state.
-.It Li shared
-Count operations for lines in the
-.Dq shared
-state.
-.El
-.Pp
-The default is to count operations for lines in all the
-above states.
-.It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier
-Count data cache accesses by lock instructions.
-This event is only available on processors of revision C or later
-vintage.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li accesses
-Count data cache accesses by lock instructions.
-.It Li misses
-Count data cache misses by lock instructions.
-.El
-.Pp
-The default is to count all accesses.
-.It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier
-Count the number of dispatched prefetch instructions.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li load
-Count load operations.
-.It Li nta
-Count non-temporal operations.
-.It Li store
-Count store operations.
-.El
-.Pp
-The default is to count all operations.
-.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit
-Count L1 DTLB misses that are L2 DTLB hits.
-.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss
-Count L1 DTLB misses that are also misses in the L2 DTLB.
-.It Li k8-dc-microarchitectural-early-cancel-of-an-access
-Count microarchitectural early cancels of data cache accesses.
-.It Li k8-dc-microarchitectural-late-cancel-of-an-access
-Count microarchitectural late cancels of data cache accesses.
-.It Li k8-dc-misaligned-data-reference
-Count misaligned data references.
-.It Li k8-dc-miss
-Count data cache misses.
-.It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier
-Count one bit ECC errors found by the scrubber.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li scrubber
-Count scrubber detected errors.
-.It Li piggyback
-Count piggyback scrubber errors.
-.El
-.Pp
-The default is to count both kinds of errors.
-.It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier
-Count data cache refills from L2 cache.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li exclusive
-Count operations for lines in the
-.Dq exclusive
-state.
-.It Li invalid
-Count operations for lines in the
-.Dq invalid
-state.
-.It Li modified
-Count operations for lines in the
-.Dq modified
-state.
-.It Li owner
-Count operations for lines in the
-.Dq owner
-state.
-.It Li shared
-Count operations for lines in the
-.Dq shared
-state.
-.El
-.Pp
-The default is to count operations for lines in all the
-above states.
-.It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier
-Count data cache refills from system memory.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li exclusive
-Count operations for lines in the
-.Dq exclusive
-state.
-.It Li invalid
-Count operations for lines in the
-.Dq invalid
-state.
-.It Li modified
-Count operations for lines in the
-.Dq modified
-state.
-.It Li owner
-Count operations for lines in the
-.Dq owner
-state.
-.It Li shared
-Count operations for lines in the
-.Dq shared
-state.
-.El
-.Pp
-The default is to count operations for lines in all the
-above states.
-.It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier
-Count the number of dispatched FPU ops.
-This event is supported in revision B and later CPUs.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li add-pipe-excluding-junk-ops
-Count add pipe ops excluding junk ops.
-.It Li add-pipe-junk-ops
-Count junk ops in the add pipe.
-.It Li multiply-pipe-excluding-junk-ops
-Count multiply pipe ops excluding junk ops.
-.It Li multiply-pipe-junk-ops
-Count junk ops in the multiply pipe.
-.It Li store-pipe-excluding-junk-ops
-Count store pipe ops excluding junk ops
-.It Li store-pipe-junk-ops
-Count junk ops in the store pipe.
-.El
-.Pp
-The default is to count all types of ops.
-.It Li k8-fp-cycles-with-no-fpu-ops-retired
-Count cycles when no FPU ops were retired.
-This event is supported in revision B and later CPUs.
-.It Li k8-fp-dispatched-fpu-fast-flag-ops
-Count dispatched FPU ops that use the fast flag interface.
-This event is supported in revision B and later CPUs.
-.It Li k8-fr-decoder-empty
-Count cycles when there was nothing to dispatch (i.e., the decoder
-was empty).
-.It Li k8-fr-dispatch-stalls
-Count all dispatch stalls.
-.It Li k8-fr-dispatch-stall-for-segment-load
-Count dispatch stalls for segment loads.
-.It Li k8-fr-dispatch-stall-for-serialization
-Count dispatch stalls for serialization.
-.It Li k8-fr-dispatch-stall-from-branch-abort-to-retire
-Count dispatch stalls from branch abort to retiral.
-.It Li k8-fr-dispatch-stall-when-fpu-is-full
-Count dispatch stalls when the FPU is full.
-.It Li k8-fr-dispatch-stall-when-ls-is-full
-Count dispatch stalls when the load/store unit is full.
-.It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full
-Count dispatch stalls when the reorder buffer is full.
-.It Li k8-fr-dispatch-stall-when-reservation-stations-are-full
-Count dispatch stalls when reservation stations are full.
-.It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet
-Count dispatch stalls when waiting for all to be quiet.
-.\" XXX What does "waiting for all to be quiet" mean?
-.It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending
-Count dispatch stalls when a far control transfer or a resync branch
-is pending.
-.It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier
-Count FPU exceptions.
-This event is supported in revision B and later CPUs.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li sse-and-x87-microtraps
-Count SSE and x87 microtraps.
-.It Li sse-reclass-microfaults
-Count SSE reclass microfaults
-.It Li sse-retype-microfaults
-Count SSE retype microfaults
-.It Li x87-reclass-microfaults
-Count x87 reclass microfaults.
-.El
-.Pp
-The default is to count all types of exceptions.
-.It Li k8-fr-interrupts-masked-cycles
-Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero).
-.It Li k8-fr-interrupts-masked-while-pending-cycles
-Count cycles while interrupts were masked while pending (i.e., cycles
-when INTR was asserted while CPU RFLAGS field IF was zero).
-.It Li k8-fr-number-of-breakpoints-for-dr0
-Count the number of breakpoints for DR0.
-.It Li k8-fr-number-of-breakpoints-for-dr1
-Count the number of breakpoints for DR1.
-.It Li k8-fr-number-of-breakpoints-for-dr2
-Count the number of breakpoints for DR2.
-.It Li k8-fr-number-of-breakpoints-for-dr3
-Count the number of breakpoints for DR3.
-.It Li k8-fr-retired-branches
-Count retired branches including exceptions and interrupts.
-.It Li k8-fr-retired-branches-mispredicted
-Count mispredicted retired branches.
-.It Li k8-fr-retired-far-control-transfers
-Count retired far control transfers (which are always mispredicted).
-.It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier
-Count retired fastpath double op instructions.
-This event is supported in revision B and later CPUs.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li low-op-pos-0
-Count instructions with the low op in position 0.
-.It Li low-op-pos-1
-Count instructions with the low op in position 1.
-.It Li low-op-pos-2
-Count instructions with the low op in position 2.
-.El
-.Pp
-The default is to count all types of instructions.
-.It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier
-Count retired FPU instructions.
-This event is supported in revision B and later CPUs.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li mmx-3dnow
-Count MMX and 3DNow!\& instructions.
-.It Li packed-sse-sse2
-Count packed SSE and SSE2 instructions.
-.It Li scalar-sse-sse2
-Count scalar SSE and SSE2 instructions
-.It Li x87
-Count x87 instructions.
-.El
-.Pp
-The default is to count all types of instructions.
-.It Li k8-fr-retired-near-returns
-Count retired near returns.
-.It Li k8-fr-retired-near-returns-mispredicted
-Count mispredicted near returns.
-.It Li k8-fr-retired-resyncs
-Count retired resyncs (non-control transfer branches).
-.It Li k8-fr-retired-taken-hardware-interrupts
-Count retired taken hardware interrupts.
-.It Li k8-fr-retired-taken-branches
-Count retired taken branches.
-.It Li k8-fr-retired-taken-branches-mispredicted
-Count retired taken branches that were mispredicted.
-.It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare
-Count retired taken branches that were mispredicted only due to an
-address miscompare.
-.It Li k8-fr-retired-uops
-Count retired uops.
-.It Li k8-fr-retired-x86-instructions
-Count retired x86 instructions including exceptions and interrupts.
-.It Li k8-ic-fetch
-Count instruction cache fetches.
-.It Li k8-ic-instruction-fetch-stall
-Count cycles in stalls due to instruction fetch.
-.It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit
-Count L1 ITLB misses that are L2 ITLB hits.
-.It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss
-Count ITLB misses that miss in both L1 and L2 ITLBs.
-.It Li k8-ic-microarchitectural-resync-by-snoop
-Count microarchitectural resyncs caused by snoops.
-.It Li k8-ic-miss
-Count instruction cache misses.
-.It Li k8-ic-refill-from-l2
-Count instruction cache refills from L2 cache.
-.It Li k8-ic-refill-from-system
-Count instruction cache refills from system memory.
-.It Li k8-ic-return-stack-hits
-Count hits to the return stack.
-.It Li k8-ic-return-stack-overflow
-Count overflows of the return stack.
-.It Li k8-ls-buffer2-full
-Count load/store buffer2 full events.
-.It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier
-Count locked operations.
-For revision C and later CPUs, the following qualifiers are supported:
-.Pp
-.Bl -tag -width indent -compact
-.It Li cycles-in-request
-Count the number of cycles in the lock request/grant stage.
-.It Li cycles-to-complete
-Count the number of cycles a lock takes to complete once it is
-non-speculative and is the older load/store operation.
-.It Li locked-instructions
-Count the number of lock instructions executed.
-.El
-.Pp
-The default is to count the number of lock instructions executed.
-.It Li k8-ls-microarchitectural-late-cancel
-Count microarchitectural late cancels of operations in the load/store
-unit.
-.It Li k8-ls-microarchitectural-resync-by-self-modifying-code
-Count microarchitectural resyncs caused by self-modifying code.
-.It Li k8-ls-microarchitectural-resync-by-snoop
-Count microarchitectural resyncs caused by snoops.
-.It Li k8-ls-retired-cflush-instructions
-Count retired CFLUSH instructions.
-.It Li k8-ls-retired-cpuid-instructions
-Count retired CPUID instructions.
-.It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier
-Count segment register loads.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Bl -tag -width indent -compact
-.It Li cs
-Count CS register loads.
-.It Li ds
-Count DS register loads.
-.It Li es
-Count ES register loads.
-.It Li fs
-Count FS register loads.
-.It Li gs
-Count GS register loads.
-.\" .It Li hs
-.\" Count HS register loads.
-.\" XXX "HS" register?
-.It Li ss
-Count SS register loads.
-.El
-.Pp
-The default is to count all types of loads.
-.It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier
-Count memory controller bypass counter saturation events.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li dram-controller-interface-bypass
-Count DRAM controller interface bypass.
-.It Li dram-controller-queue-bypass
-Count DRAM controller queue bypass.
-.It Li memory-controller-hi-pri-bypass
-Count memory controller high priority bypasses.
-.It Li memory-controller-lo-pri-bypass
-Count memory controller low priority bypasses.
-.El
-.Pp
-.It Li k8-nb-memory-controller-dram-slots-missed
-Count memory controller DRAM command slots missed (in MemClks).
-.It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier
-Count memory controller page access events.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li page-conflict
-Count page conflicts.
-.It Li page-hit
-Count page hits.
-.It Li page-miss
-Count page misses.
-.El
-.Pp
-The default is to count all types of events.
-.It Li k8-nb-memory-controller-page-table-overflow
-Count memory control page table overflow events.
-.It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier
-Count probe events.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li probe-hit
-Count all probe hits.
-.It Li probe-hit-dirty-no-memory-cancel
-Count probe hits without memory cancels.
-.It Li probe-hit-dirty-with-memory-cancel
-Count probe hits with memory cancels.
-.It Li probe-miss
-Count probe misses.
-.El
-.It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier
-Count sized commands issued.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li nonpostwrszbyte
-.It Li nonpostwrszdword
-.It Li postwrszbyte
-.It Li postwrszdword
-.It Li rdszbyte
-.It Li rdszdword
-.It Li rdmodwr
-.El
-.Pp
-The default is to count all types of commands.
-.It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier
-Count memory control turnaround events.
-This event may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.\" XXX doc is unclear whether these are cycle counts or event counts
-.It Li dimm-turnaround
-Count DIMM turnarounds.
-.It Li read-to-write-turnaround
-Count read to write turnarounds.
-.It Li write-to-read-turnaround
-Count write to read turnarounds.
-.El
-.Pp
-The default is to count all types of events.
-.It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier
-.It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier
-.It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier
-Count events on the HyperTransport(tm) buses.
-These events may be further qualified using
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li buffer-release
-Count buffer release messages sent.
-.It Li command
-Count command messages sent.
-.It Li data
-Count data messages sent.
-.It Li nop
-Count nop messages sent.
-.El
-.Pp
-The default is to count all types of messages.
-.El
-.Ss Intel Pentium PMCS
-Intel Pentium PMCs are present in Intel
-.Tn Pentium
-and
-.Tn "Pentium MMX"
-processors.
-.Pp
-These CPUs have two counters.
-Some events may only be used on specific counters and some events
-are defined only on processors supporting the MMX instruction set.
-.Pp
-These PMCs are documented in
-.Rs
-.%B "Intel 64 and IA-32 Intel(R) Architectures Software Developer's Manual"
-.%T "Volume 3B: System Programming Guide, Part 2"
-.%N "Order Number 253669-024US"
-.%D "August 2007"
-.%Q "Intel Corporation"
-.Re
-.Pp
-Event specifiers for Intel Pentium PMCs can have the following common
-qualifiers:
-.Bl -tag -width indent
-.It Li duration
-Count duration (in clocks) of events.
-The default is to count events.
-.It Li os
-Measure events at privilege levels 0, 1 and 2.
-.It Li overflow
-Assert the external processor pin associated with a counter on counter
-overflow.
-.It Li usr
-Measure events at privilege level 3.
-.El
-.Pp
-Note that these PMCs do not have the ability to interrupt the CPU.
-.Pp
-The event specifiers supported by Intel Pentium PMCs are:
-.Bl -tag -width indent
-.It Li p5-any-segment-register-loaded
-The number of writes to any segment register, including the LDTR,
-GDTR, TR and IDTR.
-Far control transfers and task switches that involve privilege
-level changes will count this event twice.
-.It Li p5-bank-conflicts
-The number of actual bank conflicts.
-.It Li p5-branches
-The number of taken and not taken branches including branches, jumps, calls,
-software interrupts and interrupt returns.
-.It Li p5-breakpoint-match-on-dr0-register
-The number of matches on the DR0 breakpoint register.
-.It Li p5-breakpoint-match-on-dr1-register
-The number of matches on the DR1 breakpoint register.
-.It Li p5-breakpoint-match-on-dr2-register
-The number of matches on the DR2 breakpoint register.
-.It Li p5-breakpoint-match-on-dr3-register
-The number of matches on the DR3 breakpoint register.
-.It Li p5-btb-false-entries
-.Pq Tn Pentium MMX
-The number of false entries in the BTB.
-This event is only allocated on counter 0.
-.It Li p5-btb-hits
-The number of branches executed that hit in the branch table buffer.
-.It Li p5-btb-miss-prediction-on-not-taken-branch
-.Pq Tn Pentium MMX
-The number of times the BTB predicted a not-taken branch as taken.
-This event is only allocated on counter 1.
-.It Li p5-bus-cycle-duration
-The number of cycles while a bus cycle was in progress.
-.It Li p5-bus-ownership-latency
-.Pq Tn Pentium MMX
-The time from bus ownership being requested to ownership being granted.
-This event is only allocated on counter 0.
-.It Li p5-bus-ownership-transfers
-.Pq Tn Pentium MMX
-The number of bus ownership transfers.
-This event is only allocated on counter 1.
-.It Li p5-bus-utilization-due-to-processor-activity
-.Pq Tn Pentium MMX
-The number of clocks the bus is busy due to the processor's own
-activity.
-This event is only allocated on counter 0.
-.It Li p5-cache-line-sharing
-.Pq Tn Pentium MMX
-The number of shared data lines in L1 cache.
-This event is only allocated on counter 1.
-.It Li p5-cache-m-state-line-sharing
-.Pq Tn Pentium MMX
-The number of hits to an M- state line due to a memory access by
-another processor.
-This event is only allocated on counter 0.
-.It Li p5-code-cache-miss
-The number of instruction reads that miss the internal code cache.
-Both cacheable and uncacheable misses are counted.
-.It Li p5-code-read
-The number of instruction reads to both cacheable and uncacheable regions.
-.It Li p5-code-tlb-miss
-The number of instruction reads that miss the instruction TLB.
-Both cacheable and uncacheable unreads are counted.
-.It Li p5-d1-starvation-and-fifo-is-empty
-.Pq Tn Pentium MMX
-The number of times the D1 stage cannot issue any instructions because
-the FIFO was empty.
-This event is only allocated on counter 0.
-.It Li p5-d1-starvation-and-only-one-instruction-in-fifo
-.Pq Tn Pentium MMX
-The number of times the D1 stage could issue only one instruction
-because the FIFO had one instruction ready.
-This event is only allocated on counter 1.
-.It Li p5-data-cache-lines-written-back
-The number of data cache lines that are written back, including
-those caused by internal and external snoops.
-.It Li p5-data-cache-tlb-miss-stall-duration
-.Pq Tn Pentium MMX
-The number of clocks the pipeline is stalled due to a data cache
-TLB miss.
-This event is only allocated on counter 1.
-.It Li p5-data-read
-The number of memory data reads, counting internal data cache hits and
-misses.
-I/O and data memory accesses due to TLB miss processing are
-not included.
-Split cycle reads are counted individually.
-.It Li p5-data-read-miss
-The number of memory read accesses that miss the data cache, counting
-both cacheable and uncacheable accesses.
-Data accesses that are part of TLB miss processing are not included.
-I/O accesses are not included.
-.It Li p5-data-read-miss-or-write-miss
-The number of data reads and writes that miss the internal data cache,
-counting uncacheable accesses.
-Data accesses due to TLB miss processing are not counted.
-.It Li p5-data-read-or-write
-The number of data reads and writes including internal data cache hits
-and misses.
-Data reads due to TLB miss processing are not counted.
-.It Li p5-data-tlb-miss
-The number of misses to the data cache translation lookaside buffer.
-.It Li p5-data-write
-The number of memory data writes, counting internal data cache hits
-and misses.
-I/O is not included and split cycle writes are counted individually.
-.It Li p5-data-write-miss
-The number of memory write accesses that miss the data cache, counting
-both cacheable and uncacheable accesses.
-I/O accesses are not counted.
-.It Li p5-emms-instructions-executed
-.Pq Tn Pentium MMX
-The number of EMMS instructions executed.
-This event is only allocated on counter 0.
-.It Li p5-external-data-cache-snoop-hits
-The number of external snoops to the data cache that hit a valid line,
-or the data line fill buffer, or one of the write back buffers.
-.It Li p5-external-snoops
-The number of external snoop requests accepted, including snoops that
-hit in the code cache, the data cache and that hit in neither.
-.It Li p5-floating-point-stalls-duration
-.Pq Tn Pentium MMX
-The number of cycles the pipeline is stalled due to a floating point
-freeze.
-This event is only allocated on counter 0.
-.It Li p5-flops
-The number of floating point adds, subtracts, multiples, divides and
-square roots.
-Transcendental instructions trigger this event multiple times.
-Instructions generating divide-by-zero, negative square root, special
-operand and stack exceptions are not counted.
-Integer multiply instructions that use the x87 FPU are counted.
-.It Li p5-full-write-buffer-stall-duration-while-executing-mmx-instructions
-.Pq Tn Pentium MMX
-The number of clocks the pipeline has stalled due to full write
-buffers when executing MMX instructions.
-This event is only allocated on counter 0.
-.It Li p5-hardware-interrupts
-The number of taken INTR and NMI interrupts.
-.It Li p5-instructions-executed
-The number of instructions executed.
-Repeat prefixed instructions are counted only once.
-The HLT instruction is counted only once, irrespective of the number
-of cycles spent in the halted state.
-All hardware and software exceptions are counted as instructions, and
-fault handler invocations are also counted as instructions.
-.It Li p5-instructions-executed-v-pipe
-The number of instructions that executed in the V pipe.
-.It Li p5-io-read-or-write-cycle
-The number of bus cycles directed to I/O space.
-.It Li p5-locked-bus-cycle
-The number of locked bus cycles that occur on account of the lock
-prefixes, LOCK instructions, page table updates and descriptor table
-updates.
-.It Li p5-memory-accesses-in-both-pipes
-The number of data memory reads or writes that are paired in both pipes.
-.It Li p5-misaligned-data-memory-on-mmx-instructions
-.Pq Tn Pentium MMX
-The number of misaligned data memory references when executing MMX
-instructions.
-This event is only allocated on counter 0.
-.It Li p5-misaligned-data-memory-or-io-references
-The number of memory or I/O reads or writes that are not aligned on
-natural boundaries.
-2- and 4-byte accesses are counted as misaligned if they cross a 4
-byte boundary.
-.It Li p5-mispredicted-or-unpredicted-returns
-.Pq Tn Pentium MMX
-The number of returns predicted incorrectly or not at all, only
-counting RET instructions.
-This event is only allocated on counter 0.
-.It Li p5-mmx-instruction-data-read-misses
-.Pq Tn Pentium MMX
-The number of MMX instruction data read misses.
-This event is only allocated on counter 1.
-.It Li p5-mmx-instruction-data-reads
-.Pq Tn Pentium MMX
-The number of MMX instruction data reads.
-This event is only allocated on counter 0.
-.It Li p5-mmx-instruction-data-write-misses
-.Pq Tn Pentium MMX
-The number of data write misses caused by MMX instructions.
-This event is only allocated on counter 1.
-.It Li p5-mmx-instruction-data-writes
-.Pq Tn Pentium MMX
-The number of data writes caused by MMX instructions.
-This event is only allocated on counter 0.
-.It Li p5-mmx-instructions-executed-u-pipe
-.Pq Tn Pentium MMX
-The number of MMX instructions executed in the U pipe.
-This event is only allocated on counter 0.
-.It Li p5-mmx-instructions-executed-v-pipe
-The number of MMX instructions executed in the V pipe.
-This event is only allocated on counter 1.
-.It Li p5-mmx-multiply-unit-interlock
-.Pq Tn Pentium MMX
-The number of clocks the pipeline is stalled because the destination
-of a prior MMX multiply is not ready.
-This event is only allocated on counter 0.
-.It Li p5-movd-movq-store-stall-due-to-previous-mmx-operation
-.Pq Tn Pentium MMX
-The number of clocks a MOVD/MOVQ instruction stalled in the D2 stage
-of the pipeline due to a previous MMX instruction.
-This event is only allocated on counter 1.
-.It Li p5-noncacheable-memory-reads
-The number of bus cycles for non-cacheable instruction or data reads,
-including cycles caused by TLB misses.
-.It Li p5-number-of-cycles-not-in-halt-state
-.Pq Tn Pentium MMX
-The number of cycles the processor is not idle due to the HLT
-instruction.
-This event is only allocated on counter 0.
-.It Li p5-pipeline-agi-stalls
-The number of address generation interlock stalls.
-An AGI that occurs in both the U and V pipelines in the same clock
-signals the event twice.
-.It Li p5-pipeline-flushes
-The number of pipeline flushes that occur.
-Pipeline flushes are caused by branch mispredicts, exceptions,
-interrupts, some segment register loads, and BTB misses.
-Prefetch queue flushes due to serializing instructions are not
-counted.
-.It Li p5-pipeline-flushes-due-to-wrong-branch-predictions
-.Pq Tn Pentium MMX
-The number of pipeline flushes due to wrong branch predictions
-resolved in either the E- or WB- stage of the pipeline.
-This event is only allocated on counter 0.
-.It Li p5-pipeline-flushes-due-to-wrong-branch-predictions-resolved-in-wb-stage
-.Pq Tn Pentium MMX
-The number of pipeline flushes due to wrong branch predictions
-resolved in the stage of the pipeline.
-This event is only allocated on counter 1.
-.It Li p5-pipeline-stall-for-mmx-instruction-data-memory-reads
-.Pq Tn Pentium MMX
-The number of clocks during pipeline stalls caused by waiting MMX data
-memory reads.
-This event is only allocated on counter 0.
-.It Li p5-predicted-returns
-.Pq Tn Pentium MMX
-The number of predicted returns, whether correct or incorrect.
-This counter only counts RET instructions.
-This event is only allocated on counter 1.
-.It Li p5-returns
-.Pq Tn Pentium MMX
-The number of RET instructions executed.
-This event is only allocated on counter 0.
-.It Li p5-saturating-mmx-instructions-executed
-.Pq Tn Pentium MMX
-The number of saturating MMX instructions executed.
-This event is only allocated on counter 0.
-.It Li p5-saturations-performed
-.Pq Tn Pentium MMX
-The number of saturating MMX instructions executed when at least one
-of its results were actually saturated.
-This event is only allocated on counter 1.
-.It Li p5-stall-on-mmx-instruction-write-to-e-o-m-state-line
-.Pq Tn Pentium MMX
-The number of clocks during stalls on MMX instructions writing to
-E- or M- state cache lines.
-This event is only allocated on counter 1.
-.It Li p5-stall-on-write-to-an-e-or-m-state-line
-The number of stalls on a write to an exclusive or modified data cache
-line.
-.It Li p5-taken-branch-or-btb-hit
-The number of events that may cause a hit in the BTB, namely either
-taken branches or BTB hits.
-.It Li p5-taken-branches
-.Pq Tn Pentium MMX
-The number of taken branches.
-This event is only allocated on counter 1.
-.It Li p5-transitions-between-mmx-and-fp-instructions
-.Pq Tn Pentium MMX
-The number of transitions between MMX and floating-point instructions
-and vice-versa.
-This event is only allocated on counter 1.
-.It Li p5-waiting-for-data-memory-read-stall-duration
-The number of clocks the pipeline was stalled waiting for data
-memory reads.
-Data TLB misses processing is included in this count.
-.It Li p5-write-buffer-full-stall-duration
-The number of clocks while the pipeline was stalled due to write
-buffers being full.
-.It Li p5-write-hit-to-m-or-e-state-lines
-The number of writes that hit exclusive or modified lines in the data
-cache.
-.It Li p5-writes-to-noncacheable-memory
-.Pq Tn Pentium MMX
-The number of writes to non-cacheable memory, including write cycles
-caused by TLB misses and I/O writes.
-This event is only allocated on counter 1.
-.El
-.Ss Intel P6 PMCS
-Intel P6 PMCs are present in Intel
-.Tn "Pentium Pro" ,
-.Tn "Pentium II" ,
-.Tn Celeron ,
-.Tn "Pentium III"
-and
-.Tn "Pentium M"
-processors.
-.Pp
-These CPUs have two counters.
-Some events may only be used on specific counters and some events are
-defined only on specific processor models.
-.Pp
-These PMCs are documented in
-.Rs
-.%B "IA-32 Intel(R) Architecture Software Developer's Manual"
-.%T "Volume 3: System Programming Guide"
-.%N "Order Number 245472-012"
-.%D 2003
-.%Q "Intel Corporation"
-.Re
-.Pp
-Some of these events are affected by processor errata described in
-.Rs
-.%B "Intel(R) Pentium(R) III Processor Specification Update"
-.%N "Document Number: 244453-054"
-.%D "April 2005"
-.%Q "Intel Corporation"
-.Re
-.Pp
-Event specifiers for Intel P6 PMCs can have the following common
-qualifiers:
-.Bl -tag -width indent
-.It Li cmask= Ns Ar value
-Configure the PMC to increment only if the number of configured
-events measured in a cycle is greater than or equal to
-.Ar value .
-.It Li edge
-Configure the PMC to count the number of deasserted to asserted
-transitions of the conditions expressed by the other qualifiers.
-If specified, the counter will increment only once whenever a
-condition becomes true, irrespective of the number of clocks during
-which the condition remains true.
-.It Li inv
-Invert the sense of comparision when the
-.Dq Li cmask
-qualifier is present, making the counter increment when the number of
-events per cycle is less than the value specified by the
-.Dq Li cmask
-qualifier.
-.It Li os
-Configure the PMC to count events happening at processor privilege
-level 0.
-.It Li umask= Ns Ar value
-This qualifier is used to further qualify the event selected (see
-below).
-.It Li usr
-Configure the PMC to count events occurring at privilege levels 1, 2
-or 3.
-.El
-.Pp
-If neither of the
-.Dq Li os
-or
-.Dq Li usr
-qualifiers are specified, the default is to enable both.
-.Pp
-The event specifiers supported by Intel P6 PMCs are:
-.Bl -tag -width indent
-.It Li p6-baclears
-Count the number of times a static branch prediction was made by the
-branch decoder because the BTB did not have a prediction.
-.It Li p6-br-bac-missp-exec
-.Pq Tn "Pentium M"
-Count the number of branch instructions executed that where
-mispredicted at the Front End (BAC).
-.It Li p6-br-bogus
-Count the number of bogus branches.
-.It Li p6-br-call-exec
-.Pq Tn "Pentium M"
-Count the number of call instructions executed.
-.It Li p6-br-call-missp-exec
-.Pq Tn "Pentium M"
-Count the number of call instructions executed that were mispredicted.
-.It Li p6-br-cnd-exec
-.Pq Tn "Pentium M"
-Count the number of conditional branch instructions executed.
-.It Li p6-br-cnd-missp-exec
-.Pq Tn "Pentium M"
-Count the number of conditional branch instructions executed that were
-mispredicted.
-.It Li p6-br-ind-call-exec
-.Pq Tn "Pentium M"
-Count the number of indirect call instructions executed.
-.It Li p6-br-ind-exec
-.Pq Tn "Pentium M"
-Count the number of indirect branch instructions executed.
-.It Li p6-br-ind-missp-exec
-.Pq Tn "Pentium M"
-Count the number of indirect branch instructions executed that were
-mispredicted.
-.It Li p6-br-inst-decoded
-Count the number of branch instructions decoded.
-.It Li p6-br-inst-exec
-.Pq Tn "Pentium M"
-Count the number of branch instructions executed but necessarily retired.
-.It Li p6-br-inst-retired
-Count the number of branch instructions retired.
-.It Li p6-br-miss-pred-retired
-Count the number of mispredicted branch instructions retired.
-.It Li p6-br-miss-pred-taken-ret
-Count the number of taken mispredicted branches retired.
-.It Li p6-br-missp-exec
-.Pq Tn "Pentium M"
-Count the number of branch instructions executed that were
-mispredicted at execution.
-.It Li p6-br-ret-bac-missp-exec
-.Pq Tn "Pentium M"
-Count the number of return instructions executed that were
-mispredicted at the Front End (BAC).
-.It Li p6-br-ret-exec
-.Pq Tn "Pentium M"
-Count the number of return instructions executed.
-.It Li p6-br-ret-missp-exec
-.Pq Tn "Pentium M"
-Count the number of return instructions executed that were
-mispredicted at execution.
-.It Li p6-br-taken-retired
-Count the number of taken branches retired.
-.It Li p6-btb-misses
-Count the number of branches for which the BTB did not produce a
-prediction.
-.It Li p6-bus-bnr-drv
-Count the number of bus clock cycles during which this processor is
-driving the BNR# pin.
-.It Li p6-bus-data-rcv
-Count the number of bus clock cycles during which this processor is
-receiving data.
-.It Li p6-bus-drdy-clocks Op Li ,umask= Ns Ar qualifier
-Count the number of clocks during which DRDY# is asserted.
-An additional qualifier may be specified, and comprises one of the
-following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-hit-drv
-Count the number of bus clock cycles during which this processor is
-driving the HIT# pin.
-.It Li p6-bus-hitm-drv
-Count the number of bus clock cycles during which this processor is
-driving the HITM# pin.
-.It Li p6-bus-lock-clocks Op Li ,umask= Ns Ar qualifier
-Count the number of clocks during with LOCK# is asserted on the
-external system bus.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-req-outstanding
-Count the number of bus requests outstanding in any given cycle.
-.It Li p6-bus-snoop-stall
-Count the number of clock cycles during which the bus is snoop stalled.
-.It Li p6-bus-tran-any Op Li ,umask= Ns Ar qualifier
-Count the number of completed bus transactions of any kind.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-tran-brd Op Li ,umask= Ns Ar qualifier
-Count the number of burst read transactions.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-tran-burst Op Li ,umask= Ns Ar qualifier
-Count the number of completed burst transactions.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-tran-def Op Li ,umask= Ns Ar qualifier
-Count the number of completed deferred transactions.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-tran-ifetch Op Li ,umask= Ns Ar qualifier
-Count the number of completed instruction fetch transactions.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-tran-inval Op Li ,umask= Ns Ar qualifier
-Count the number of completed invalidate transactions.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-tran-mem Op Li ,umask= Ns Ar qualifier
-Count the number of completed memory transactions.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-tran-pwr Op Li ,umask= Ns Ar qualifier
-Count the number of completed partial write transactions.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-tran-rfo Op Li ,umask= Ns Ar qualifier
-Count the number of completed read-for-ownership transactions.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-trans-io Op Li ,umask= Ns Ar qualifier
-Count the number of completed I/O transactions.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-trans-p Op Li ,umask= Ns Ar qualifier
-Count the number of completed partial transactions.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-bus-trans-wb Op Li ,umask= Ns Ar qualifier
-Count the number of completed write-back transactions.
-An additional qualifier may be specified and comprises one of the following
-keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count transactions generated by any agent on the bus.
-.It Li self
-Count transactions generated by this processor.
-.El
-.Pp
-The default is to count operations generated by this processor.
-.It Li p6-cpu-clk-unhalted
-Count the number of cycles during with the processor was not halted.
-.Pp
-.Pq Tn "Pentium M"
-Count the number of cycles during with the processor was not halted
-and not in a thermal trip.
-.It Li p6-cycles-div-busy
-Count the number of cycles during which the divider is busy and cannot
-accept new divides.
-This event is only allocated on counter 0.
-.It Li p6-cycles-in-pending-and-masked
-Count the number of processor cycles for which interrupts were
-disabled and interrupts were pending.
-.It Li p6-cycles-int-masked
-Count the number of processor cycles for which interrupts were
-disabled.
-.It Li p6-data-mem-refs
-Count all loads and all stores using any memory type, including
-internal retries.
-Each part of a split store is counted separately.
-.It Li p6-dcu-lines-in
-Count the total lines allocated in the data cache unit.
-.It Li p6-dcu-m-lines-in
-Count the number of M state lines allocated in the data cache unit.
-.It Li p6-dcu-m-lines-out
-Count the number of M state lines evicted from the data cache unit.
-.It Li p6-dcu-miss-outstanding
-Count the weighted number of cycles while a data cache unit miss is
-outstanding, incremented by the number of outstanding cache misses at
-any time.
-.It Li p6-div
-Count the number of integer and floating-point divides including
-speculative divides.
-This event is only allocated on counter 1.
-.It Li p6-emon-esp-uops
-.Pq Tn "Pentium M"
-Count the total number of micro-ops.
-.It Li p6-emon-est-trans Op Li ,umask= Ns Ar qualifier
-.Pq Tn "Pentium M"
-Count the number of
-.Tn "Enhanced Intel SpeedStep"
-transitions.
-An additional qualifier may be specified, and can be one of the
-following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li all
-Count all transitions.
-.It Li freq
-Count only frequency transitions.
-.El
-.Pp
-The default is to count all transitions.
-.It Li p6-emon-fused-uops-ret Op Li ,umask= Ns Ar qualifier
-.Pq Tn "Pentium M"
-Count the number of retired fused micro-ops.
-An additional qualifier may be specified, and may be one of the
-following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li all
-Count all fused micro-ops.
-.It Li loadop
-Count only load and op micro-ops.
-.It Li stdsta
-Count only STD/STA micro-ops.
-.El
-.Pp
-The default is to count all fused micro-ops.
-.It Li p6-emon-kni-comp-inst-ret
-.Pq Tn "Pentium III"
-Count the number of SSE computational instructions retired.
-An additional qualifier may be specified, and comprises one of the
-following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li packed-and-scalar
-Count packed and scalar operations.
-.It Li scalar
-Count scalar operations only.
-.El
-.Pp
-The default is to count packed and scalar operations.
-.It Li p6-emon-kni-inst-retired Op Li ,umask= Ns Ar qualifier
-.Pq Tn "Pentium III"
-Count the number of SSE instructions retired.
-An additional qualifier may be specified, and comprises one of the
-following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li packed-and-scalar
-Count packed and scalar operations.
-.It Li scalar
-Count scalar operations only.
-.El
-.Pp
-The default is to count packed and scalar operations.
-.It Li p6-emon-kni-pref-dispatched Op Li ,umask= Ns Ar qualifier
-.Pq Tn "Pentium III"
-Count the number of SSE prefetch or weakly ordered instructions
-dispatched (including speculative prefetches).
-An additional qualifier may be specified, and comprises one of the
-following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li nta
-Count non-temporal prefetches.
-.It Li t1
-Count prefetches to L1.
-.It Li t2
-Count prefetches to L2.
-.It Li wos
-Count weakly ordered stores.
-.El
-.Pp
-The default is to count non-temporal prefetches.
-.It Li p6-emon-kni-pref-miss Op Li ,umask= Ns Ar qualifier
-.Pq Tn "Pentium III"
-Count the number of prefetch or weakly ordered instructions that miss
-all caches.
-An additional qualifier may be specified, and comprises one of the
-following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li nta
-Count non-temporal prefetches.
-.It Li t1
-Count prefetches to L1.
-.It Li t2
-Count prefetches to L2.
-.It Li wos
-Count weakly ordered stores.
-.El
-.Pp
-The default is to count non-temporal prefetches.
-.It Li p6-emon-pref-rqsts-dn
-.Pq Tn "Pentium M"
-Count the number of downward prefetches issued.
-.It Li p6-emon-pref-rqsts-up
-.Pq Tn "Pentium M"
-Count the number of upward prefetches issued.
-.It Li p6-emon-simd-instr-retired
-.Pq Tn "Pentium M"
-Count the number of retired
-.Tn MMX
-instructions.
-.It Li p6-emon-sse-sse2-comp-inst-retired Op Li ,umask= Ns Ar qualifier
-.Pq Tn "Pentium M"
-Count the number of computational SSE instructions retired.
-An additional qualifier may be specified and can be one of the
-following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li sse-packed-single
-Count SSE packed-single instructions.
-.It Li sse-scalar-single
-Count SSE scalar-single instructions.
-.It Li sse2-packed-double
-Count SSE2 packed-double instructions.
-.It Li sse2-scalar-double
-Count SSE2 scalar-double instructions.
-.El
-.Pp
-The default is to count SSE packed-single instructions.
-.It Li p6-emon-sse-sse2-inst-retired Op Li ,umask= Ns Ar qualifer
-.Pp
-.Pq Tn "Pentium M"
-Count the number of SSE instructions retired.
-An additional qualifier can be specified, and can be one of the
-following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li sse-packed-single
-Count SSE packed-single instructions.
-.It Li sse-packed-single-scalar-single
-Count SSE packed-single and scalar-single instructions.
-.It Li sse2-packed-double
-Count SSE2 packed-double instructions.
-.It Li sse2-scalar-double
-Count SSE2 scalar-double instructions.
-.El
-.Pp
-The default is to count SSE packed-single instructions.
-.It Li p6-emon-synch-uops
-.Pq Tn "Pentium M"
-Count the number of sync micro-ops.
-.It Li p6-emon-thermal-trip
-.Pq Tn "Pentium M"
-Count the duration or occurrences of thermal trips.
-Use the
-.Dq Li edge
-qualifier to count occurrences of thermal trips.
-.It Li p6-emon-unfusion
-.Pq Tn "Pentium M"
-Count the number of unfusion events in the reorder buffer.
-.It Li p6-flops
-Count the number of computational floating point operations retired.
-This event is only allocated on counter 0.
-.It Li p6-fp-assist
-Count the number of floating point exceptions handled by microcode.
-This event is only allocated on counter 1.
-.It Li p6-fp-comps-ops-exe
-Count the number of computation floating point operations executed.
-This event is only allocated on counter 0.
-.It Li p6-fp-mmx-trans Op Li ,umask= Ns Ar qualifier
-.Pq Tn "Pentium II" , Tn "Pentium III"
-Count the number of transitions between MMX and floating-point
-instructions.
-An additional qualifier may be specified, and comprises one of the
-following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li mmxtofp
-Count transitions from MMX instructions to floating-point instructions.
-.It Li fptommx
-Count transitions from floating-point instructions to MMX instructions.
-.El
-.Pp
-The default is to count MMX to floating-point transitions.
-.It Li p6-hw-int-rx
-Count the number of hardware interrupts received.
-.It Li p6-ifu-fetch
-Count the number of instruction fetches, both cacheable and non-cacheable.
-.It Li p6-ifu-fetch-miss
-Count the number of instruction fetch misses (i.e., those that produce
-memory accesses).
-.It Li p6-ifu-mem-stall
-Count the number of cycles instruction fetch is stalled for any reason.
-.It Li p6-ild-stall
-Count the number of cycles the instruction length decoder is stalled.
-.It Li p6-inst-decoded
-Count the number of instructions decoded.
-.It Li p6-inst-retired
-Count the number of instructions retired.
-.It Li p6-itlb-miss
-Count the number of instruction TLB misses.
-.It Li p6-l2-ads
-Count the number of L2 address strobes.
-.It Li p6-l2-dbus-busy
-Count the number of cycles during which the L2 cache data bus was busy.
-.It Li p6-l2-dbus-busy-rd
-Count the number of cycles during which the L2 cache data bus was busy
-transferring read data from L2 to the processor.
-.It Li p6-l2-ifetch Op Li ,umask= Ns Ar qualifier
-Count the number of L2 instruction fetches.
-An additional qualifier may be specified and comprises a list of the following
-keywords separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li e
-Count operations affecting E (exclusive) state lines.
-.It Li i
-Count operations affecting I (invalid) state lines.
-.It Li m
-Count operations affecting M (modified) state lines.
-.It Li s
-Count operations affecting S (shared) state lines.
-.El
-.Pp
-The default is to count operations affecting all (MESI) state lines.
-.It Li p6-l2-ld Op Li ,umask= Ns Ar qualifier
-Count the number of L2 data loads.
-An additional qualifier may be specified and comprises a list of the following
-keywords separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li both
-.Pq Tn "Pentium M"
-Count both hardware-prefetched lines and non-hardware-prefetched lines.
-.It Li e
-Count operations affecting E (exclusive) state lines.
-.It Li hw
-.Pq Tn "Pentium M"
-Count hardware-prefetched lines only.
-.It Li i
-Count operations affecting I (invalid) state lines.
-.It Li m
-Count operations affecting M (modified) state lines.
-.It Li nonhw
-.Pq Tn "Pentium M"
-Exclude hardware-prefetched lines.
-.It Li s
-Count operations affecting S (shared) state lines.
-.El
-.Pp
-The default on processors other than
-.Tn "Pentium M"
-processors is to count operations affecting all (MESI) state lines.
-The default on
-.Tn "Pentium M"
-processors is to count both hardware-prefetched and
-non-hardware-prefetch operations on all (MESI) state lines.
-.Pq Errata
-This event is affected by processor errata E53.
-.It Li p6-l2-lines-in Op Li ,umask= Ns Ar qualifier
-Count the number of L2 lines allocated.
-An additional qualifier may be specified and comprises a list of the following
-keywords separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li both
-.Pq Tn "Pentium M"
-Count both hardware-prefetched lines and non-hardware-prefetched lines.
-.It Li e
-Count operations affecting E (exclusive) state lines.
-.It Li hw
-.Pq Tn "Pentium M"
-Count hardware-prefetched lines only.
-.It Li i
-Count operations affecting I (invalid) state lines.
-.It Li m
-Count operations affecting M (modified) state lines.
-.It Li nonhw
-.Pq Tn "Pentium M"
-Exclude hardware-prefetched lines.
-.It Li s
-Count operations affecting S (shared) state lines.
-.El
-.Pp
-The default on processors other than
-.Tn "Pentium M"
-processors is to count operations affecting all (MESI) state lines.
-The default on
-.Tn "Pentium M"
-processors is to count both hardware-prefetched and
-non-hardware-prefetch operations on all (MESI) state lines.
-.Pq Errata
-This event is affected by processor errata E45.
-.It Li p6-l2-lines-out Op Li ,umask= Ns Ar qualifier
-Count the number of L2 lines evicted.
-An additional qualifier may be specified and comprises a list of the following
-keywords separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li both
-.Pq Tn "Pentium M"
-Count both hardware-prefetched lines and non-hardware-prefetched lines.
-.It Li e
-Count operations affecting E (exclusive) state lines.
-.It Li hw
-.Pq Tn "Pentium M"
-Count hardware-prefetched lines only.
-.It Li i
-Count operations affecting I (invalid) state lines.
-.It Li m
-Count operations affecting M (modified) state lines.
-.It Li nonhw
-.Pq Tn "Pentium M" only
-Exclude hardware-prefetched lines.
-.It Li s
-Count operations affecting S (shared) state lines.
-.El
-.Pp
-The default on processors other than
-.Tn "Pentium M"
-processors is to count operations affecting all (MESI) state lines.
-The default on
-.Tn "Pentium M"
-processors is to count both hardware-prefetched and
-non-hardware-prefetch operations on all (MESI) state lines.
-.Pq Errata
-This event is affected by processor errata E45.
-.It Li p6-l2-m-lines-inm
-Count the number of modified lines allocated in L2 cache.
-.It Li p6-l2-m-lines-outm Op Li ,umask= Ns Ar qualifier
-Count the number of L2 M-state lines evicted.
-.Pp
-.Pq Tn "Pentium M"
-On these processors an additional qualifier may be specified and
-comprises a list of the following keywords separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li both
-Count both hardware-prefetched lines and non-hardware-prefetched lines.
-.It Li hw
-Count hardware-prefetched lines only.
-.It Li nonhw
-Exclude hardware-prefetched lines.
-.El
-.Pp
-The default is to count both hardware-prefetched and
-non-hardware-prefetch operations.
-.Pq Errata
-This event is affected by processor errata E53.
-.It Li p6-l2-rqsts Op Li ,umask= Ns Ar qualifier
-Count the total number of L2 requests.
-An additional qualifier may be specified and comprises a list of the following
-keywords separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li e
-Count operations affecting E (exclusive) state lines.
-.It Li i
-Count operations affecting I (invalid) state lines.
-.It Li m
-Count operations affecting M (modified) state lines.
-.It Li s
-Count operations affecting S (shared) state lines.
-.El
-.Pp
-The default is to count operations affecting all (MESI) state lines.
-.It Li p6-l2-st
-Count the number of L2 data stores.
-An additional qualifier may be specified and comprises a list of the following
-keywords separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li e
-Count operations affecting E (exclusive) state lines.
-.It Li i
-Count operations affecting I (invalid) state lines.
-.It Li m
-Count operations affecting M (modified) state lines.
-.It Li s
-Count operations affecting S (shared) state lines.
-.El
-.Pp
-The default is to count operations affecting all (MESI) state lines.
-.It Li p6-ld-blocks
-Count the number of load operations delayed due to store buffer blocks.
-.It Li p6-misalign-mem-ref
-Count the number of misaligned data memory references (crossing a 64
-bit boundary).
-.It Li p6-mmx-assist
-.Pq Tn "Pentium II" , Tn "Pentium III"
-Count the number of MMX assists executed.
-.It Li p6-mmx-instr-exec
-.Pq Tn Celeron , Tn "Pentium II"
-Count the number of MMX instructions executed, except MOVQ and MOVD
-stores from register to memory.
-.It Li p6-mmx-instr-ret
-.Pq Tn "Pentium II"
-Count the number of MMX instructions retired.
-.It Li p6-mmx-instr-type-exec Op Li ,umask= Ns Ar qualifier
-.Pq Tn "Pentium II" , Tn "Pentium III"
-Count the number of MMX instructions executed.
-An additional qualifier may be specified and comprises a list of
-the following keywords separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li pack
-Count MMX pack operation instructions.
-.It Li packed-arithmetic
-Count MMX packed arithmetic instructions.
-.It Li packed-logical
-Count MMX packed logical instructions.
-.It Li packed-multiply
-Count MMX packed multiply instructions.
-.It Li packed-shift
-Count MMX packed shift instructions.
-.It Li unpack
-Count MMX unpack operation instructions.
-.El
-.Pp
-The default is to count all operations.
-.It Li p6-mmx-sat-instr-exec
-.Pq Tn "Pentium II" , Tn "Pentium III"
-Count the number of MMX saturating instructions executed.
-.It Li p6-mmx-uops-exec
-.Pq Tn "Pentium II" , Tn "Pentium III"
-Count the number of MMX micro-ops executed.
-.It Li p6-mul
-Count the number of integer and floating-point multiplies, including
-speculative multiplies.
-This event is only allocated on counter 1.
-.It Li p6-partial-rat-stalls
-Count the number of cycles or events for partial stalls.
-.It Li p6-resource-stalls
-Count the number of cycles there was a resource related stall of any kind.
-.It Li p6-ret-seg-renames
-.Pq Tn "Pentium II" , Tn "Pentium III"
-Count the number of segment register rename events retired.
-.It Li p6-sb-drains
-Count the number of cycles the store buffer is draining.
-.It Li p6-seg-reg-renames Op Li ,umask= Ns Ar qualifier
-.Pq Tn "Pentium II" , Tn "Pentium III"
-Count the number of segment register renames.
-An additional qualifier may be specified, and comprises a list of the
-following keywords separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li ds
-Count renames for segment register DS.
-.It Li es
-Count renames for segment register ES.
-.It Li fs
-Count renames for segment register FS.
-.It Li gs
-Count renames for segment register GS.
-.El
-.Pp
-The default is to count operations affecting all segment registers.
-.It Li p6-seg-rename-stalls
-.Pq Tn "Pentium II" , Tn "Pentium III"
-Count the number of segment register renaming stalls.
-An additional qualifier may be specified, and comprises a list of the
-following keywords separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li ds
-Count stalls for segment register DS.
-.It Li es
-Count stalls for segment register ES.
-.It Li fs
-Count stalls for segment register FS.
-.It Li gs
-Count stalls for segment register GS.
-.El
-.Pp
-The default is to count operations affecting all the segment registers.
-.It Li p6-segment-reg-loads
-Count the number of segment register loads.
-.It Li p6-uops-retired
-Count the number of micro-ops retired.
-.El
-.Ss Intel P4 PMCS
-Intel P4 PMCs are present in Intel
-.Tn "Pentium 4"
-and
-.Tn Xeon
-processors.
-These PMCs are documented in
-.Rs
-.%B "IA-32 Intel(R) Architecture Software Developer's Manual"
-.%T "Volume 3: System Programming Guide"
-.%N "Order Number 245472-012"
-.%D 2003
-.%Q "Intel Corporation"
-.Re
-Further information about using these PMCs may be found in
-.Rs
-.%B "IA-32 Intel(R) Architecture Optimization Guide"
-.%D 2003
-.%N "Order Number 248966-009"
-.%Q "Intel Corporation"
-.Re
-Some of these events are affected by processor errata described in
-.Rs
-.%B "Intel(R) Pentium(R) 4 Processor Specification Update"
-.%N "Document Number: 249199-059"
-.%D "April 2005"
-.%Q "Intel Corporation"
-.Re
-.Pp
-Event specifiers for Intel P4 PMCs can have the following common
-qualifiers:
-.Bl -tag -width indent
-.It Li active= Ns Ar choice
-(On P4 HTT CPUs) Filter event counting based on which logical
-processors are active.
-The allowed values of
-.Ar choice
-are:
-.Pp
-.Bl -tag -width indent -compact
-.It Li any
-Count when either logical processor is active.
-.It Li both
-Count when both logical processors are active.
-.It Li none
-Count only when neither logical processor is active.
-.It Li single
-Count only when one logical processor is active.
-.El
-.Pp
-The default is
-.Dq Li both .
-.It Li cascade
-Configure the PMC to cascade onto its partner.
-See
-.Sx "Cascading P4 PMCs"
-below for more information.
-.It Li edge
-Configure the counter to count false to true transitions of the threshold
-comparision output.
-This qualifier only takes effect if a threshold qualifier has also been
-specified.
-.It Li complement
-Configure the counter to increment only when the event count seen is
-less than the threshold qualifier value specified.
-.It Li mask= Ns Ar qualifier
-Many event specifiers for Intel P4 PMCs need to be additionally
-qualified using a mask qualifier.
-The allowed syntax for these qualifiers is event specific and is
-described along with the events.
-.It Li os
-Configure the PMC to count when the CPL of the processor is 0.
-.It Li precise
-Select precise event based sampling.
-Precise sampling is supported by the hardware for a limited set of
-events.
-.It Li tag= Ns Ar value
-Configure the PMC to tag the internal uop selected by the other
-fields in this event specifier with value
-.Ar value .
-This feature is used when cascading PMCs.
-.It Li threshold= Ns Ar value
-Configure the PMC to increment only when the event counts seen are
-greater than the specified threshold value
-.Ar value .
-.It Li usr
-Configure the PMC to count when the CPL of the processor is 1, 2 or 3.
-.El
-.Pp
-If neither of the
-.Dq Li os
-or
-.Dq Li usr
-qualifiers are specified, the default is to enable both.
-.Pp
-On Intel Pentium 4 processors with HTT, events are
-divided into two classes:
-.Pp
-.Bl -tag -width indent -compact
-.It "TS Events"
-are those where hardware can differentiate between events
-generated on one logical processor from those generated on the
-other.
-.It "TI Events"
-are those where hardware cannot differentiate between events
-generated by multiple logical processors in a package.
-.El
-.Pp
-Only TS events are allowed for use with process-mode PMCs on
-Pentium-4/HTT CPUs.
-.Pp
-The event specifiers supported by Intel P4 PMCs are:
-.Pp
-.Bl -tag -width indent
-.It Li p4-128bit-mmx-uop Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count integer SIMD SSE2 instructions that operate on 128 bit SIMD
-operands.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li all
-Count all uops operating on 128 bit SIMD integer operands in memory or
-XMM register.
-.El
-.Pp
-If an instruction contains more than one 128 bit MMX uop, then each
-uop will be counted.
-.It Li p4-64bit-mmx-uop Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count MMX instructions that operate on 64 bit SIMD operands.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li all
-Count all uops operating on 64 bit SIMD integer operands in memory or
-in MMX registers.
-.El
-.Pp
-If an instruction contains more than one 64 bit MMX uop, then each
-uop will be counted.
-.It Li p4-b2b-cycles
-.Pq "TI event"
-Count back-to-back bus cycles.
-Further documentation for this event is unavailable.
-.It Li p4-bnr
-.Pq "TI event"
-Count bus-not-ready conditions.
-Further documentation for this event is unavailable.
-.It Li p4-bpu-fetch-request Op Li ,mask= Ns Ar qualifier
-.Pq "TS event"
-Count instruction fetch requests qualified by additional
-flags specified in
-.Ar qualifier .
-At this point only one flag is supported:
-.Pp
-.Bl -tag -width indent -compact
-.It Li tcmiss
-Count trace cache lookup misses.
-.El
-.Pp
-The default qualifier is also
-.Dq Li mask=tcmiss .
-.It Li p4-branch-retired Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Counts retired branches.
-Qualifier
-.Ar flags
-is a list of the following
-.Ql +
-separated strings:
-.Pp
-.Bl -tag -width indent -compact
-.It Li mmnp
-Count branches not-taken and predicted.
-.It Li mmnm
-Count branches not-taken and mis-predicted.
-.It Li mmtp
-Count branches taken and predicted.
-.It Li mmtm
-Count branches taken and mis-predicted.
-.El
-.Pp
-The default qualifier counts all four kinds of branches.
-.It Li p4-bsq-active-entries Op Li ,mask= Ns Ar qualifier
-.Pq "TS event"
-Count the number of entries (clipped at 15) currently active in the
-BSQ.
-Qualifier
-.Ar qualifier
-is a
-.Ql +
-separated set of the following flags:
-.Pp
-.Bl -tag -width indent -compact
-.It Li req-type0 , Li req-type1
-Forms a 2-bit number used to select the request type encoding:
-.Pp
-.Bl -tag -width indent -compact
-.It Li 0
-reads excluding read invalidate
-.It Li 1
-read invalidates
-.It Li 2
-writes other than writebacks
-.It Li 3
-writebacks
-.El
-.Pp
-Bit
-.Dq Li req-type1
-is the MSB for this two bit number.
-.It Li req-len0 , Li req-len1
-Forms a two-bit number that specifies the request length encoding:
-.Pp
-.Bl -tag -width indent -compact
-.It Li 0
-0 chunks
-.It Li 1
-1 chunk
-.It Li 3
-8 chunks
-.El
-.Pp
-Bit
-.Dq Li req-len1
-is the MSB for this two bit number.
-.It Li req-io-type
-Count requests that are input or output requests.
-.It Li req-lock-type
-Count requests that lock the bus.
-.It Li req-lock-cache
-Count requests that lock the cache.
-.It Li req-split-type
-Count requests that is a bus 8-byte chunk that is split across an
-8-byte boundary.
-.It Li req-dem-type
-Count requests that are demand (not prefetches) if set.
-Count requests that are prefetches if not set.
-.It Li req-ord-type
-Count requests that are ordered.
-.It Li mem-type0 , Li mem-type1 , Li mem-type2
-Forms a 3-bit number that specifies a memory type encoding:
-.Pp
-.Bl -tag -width indent -compact
-.It Li 0
-UC
-.It Li 1
-USWC
-.It Li 4
-WT
-.It Li 5
-WP
-.It Li 6
-WB
-.El
-.Pp
-Bit
-.Dq Li mem-type2
-is the MSB of this 3-bit number.
-.El
-.Pp
-The default qualifier has all the above bits set.
-.Pp
-Edge triggering using the
-.Dq Li edge
-qualifier should not be used with this event when counting cycles.
-.It Li p4-bsq-allocation Op Li ,mask= Ns Ar qualifier
-.Pq "TS event"
-Count allocations in the bus sequence unit according to the flags
-specified in
-.Ar qualifier ,
-which is a
-.Ql +
-separated set of the following flags:
-.Pp
-.Bl -tag -width indent -compact
-.It Li req-type0 , Li req-type1
-Forms a 2-bit number used to select the request type encoding:
-.Pp
-.Bl -tag -width indent -compact
-.It Li 0
-reads excluding read invalidate
-.It Li 1
-read invalidates
-.It Li 2
-writes other than writebacks
-.It Li 3
-writebacks
-.El
-.Pp
-Bit
-.Dq Li req-type1
-is the MSB for this two bit number.
-.It Li req-len0 , Li req-len1
-Forms a two-bit number that specifies the request length encoding:
-.Pp
-.Bl -tag -width indent -compact
-.It Li 0
-0 chunks
-.It Li 1
-1 chunk
-.It Li 3
-8 chunks
-.El
-.Pp
-Bit
-.Dq Li req-len1
-is the MSB for this two bit number.
-.It Li req-io-type
-Count requests that are input or output requests.
-.It Li req-lock-type
-Count requests that lock the bus.
-.It Li req-lock-cache
-Count requests that lock the cache.
-.It Li req-split-type
-Count requests that is a bus 8-byte chunk that is split across an
-8-byte boundary.
-.It Li req-dem-type
-Count requests that are demand (not prefetches) if set.
-Count requests that are prefetches if not set.
-.It Li req-ord-type
-Count requests that are ordered.
-.It Li mem-type0 , Li mem-type1 , Li mem-type2
-Forms a 3-bit number that specifies a memory type encoding:
-.Pp
-.Bl -tag -width indent -compact
-.It Li 0
-UC
-.It Li 1
-USWC
-.It Li 4
-WT
-.It Li 5
-WP
-.It Li 6
-WB
-.El
-.Pp
-Bit
-.Dq Li mem-type2
-is the MSB of this 3-bit number.
-.El
-.Pp
-The default qualifier has all the above bits set.
-.Pp
-This event is usually used along with the
-.Dq Li edge
-qualifier to avoid multiple counting.
-.It Li p4-bsq-cache-reference Op Li ,mask= Ns Ar qualifier
-.Pq "TS event"
-Count cache references as seen by the bus unit (2nd or 3rd level
-cache references).
-Qualifier
-.Ar qualifier
-is a
-.Ql +
-separated list of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li rd-2ndl-hits
-Count 2nd level cache hits in the shared state.
-.It Li rd-2ndl-hite
-Count 2nd level cache hits in the exclusive state.
-.It Li rd-2ndl-hitm
-Count 2nd level cache hits in the modified state.
-.It Li rd-3rdl-hits
-Count 3rd level cache hits in the shared state.
-.It Li rd-3rdl-hite
-Count 3rd level cache hits in the exclusive state.
-.It Li rd-3rdl-hitm
-Count 3rd level cache hits in the modified state.
-.It Li rd-2ndl-miss
-Count 2nd level cache misses.
-.It Li rd-3rdl-miss
-Count 3rd level cache misses.
-.It Li wr-2ndl-miss
-Count write-back lookups from the data access cache that miss the 2nd
-level cache.
-.El
-.Pp
-The default is to count all the above events.
-.It Li p4-execution-event Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count the retirement of tagged uops selected through the execution
-tagging mechanism.
-Qualifier
-.Ar flags
-can contain the following strings separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li nbogus0 , Li nbogus1 , Li nbogus2 , Li nbogus3
-The marked uops are not bogus.
-.It Li bogus0 , Li bogus1 , Li bogus2 , Li bogus3
-The marked uops are bogus.
-.El
-.Pp
-This event requires additional (upstream) events to be allocated to
-perform the desired uop tagging.
-The default is to set all the above flags.
-This event can be used for precise event based sampling.
-.It Li p4-front-end-event Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count the retirement of tagged uops selected through the front-end
-tagging mechanism.
-Qualifier
-.Ar flags
-can contain the following strings separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li nbogus
-The marked uops are not bogus.
-.It Li bogus
-The marked uops are bogus.
-.El
-.Pp
-This event requires additional (upstream) events to be allocated to
-perform the desired uop tagging.
-The default is to select both kinds of events.
-This event can be used for precise event based sampling.
-.It Li p4-fsb-data-activity Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count each DBSY or DRDY event selected by qualifier
-.Ar flags .
-Qualifier
-.Ar flags
-is a
-.Ql +
-separated set of the following flags:
-.Pp
-.Bl -tag -width indent -compact
-.It Li drdy-drv
-Count when this processor is driving data onto the bus.
-.It Li drdy-own
-Count when this processor is reading data from the bus.
-.It Li drdy-other
-Count when data is on the bus but not being sampled by this processor.
-.It Li dbsy-drv
-Count when this processor reserves the bus for use in the next cycle
-in order to drive data.
-.It Li dbsy-own
-Count when some agent reserves the bus for use in the next bus cycle
-to drive data that this processor will sample.
-.It Li dbsy-other
-Count when some agent reserves the bus for use in the next bus cycle
-to drive data that this processor will not sample.
-.El
-.Pp
-Flags
-.Dq Li drdy-own
-and
-.Dq Li drdy-other
-are mutually exclusive.
-Flags
-.Dq Li dbsy-own
-and
-.Dq Li dbsy-other
-are mutually exclusive.
-The default value for
-.Ar qualifier
-is
-.Dq Li drdy-drv+drdy-own+dbsy-drv+dbsy-own .
-.It Li p4-global-power-events Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count cycles during which the processor is not stopped.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li running
-Count cycles when the processor is active.
-.El
-.Pp
-.It Li p4-instr-retired Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count instructions retired during a clock cycle.
-Qualifer
-.Ar flags
-comprises of the following strings separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li nbogusntag
-Count non-bogus instructions that are not tagged.
-.It Li nbogustag
-Count non-bogus instructions that are tagged.
-.It Li bogusntag
-Count bogus instructions that are not tagged.
-.It Li bogustag
-Count bogus instructions that are tagged.
-.El
-.Pp
-The default qualifier counts all the above kinds of instructions.
-.It Li p4-ioq-active-entries Xo
-.Op Li ,mask= Ns Ar qualifier
-.Op Li ,busreqtype= Ns Ar req-type
-.Xc
-.Pq "TS event"
-Count the number of entries (clipped at 15) in the IOQ that are
-active.
-The event masks are specified by qualifier
-.Ar qualifier
-and
-.Ar req-type .
-.Pp
-Qualifier
-.Ar qualifier
-is a
-.Ql +
-separated set of the following flags:
-.Pp
-.Bl -tag -width indent -compact
-.It Li all-read
-Count read entries.
-.It Li all-write
-Count write entries.
-.It Li mem-uc
-Count entries accessing uncacheable memory.
-.It Li mem-wc
-Count entries accessing write-combining memory.
-.It Li mem-wt
-Count entries accessing write-through memory.
-.It Li mem-wp
-Count entries accessing write-protected memory
-.It Li mem-wb
-Count entries accessing write-back memory.
-.It Li own
-Count store requests driven by the processor (i.e., not by other
-processors or by DMA).
-.It Li other
-Count store requests driven by other processors or by DMA.
-.It Li prefetch
-Include hardware and software prefetch requests in the count.
-.El
-.Pp
-The default value for
-.Ar qualifier
-is to enable all the above flags.
-.Pp
-The
-.Ar req-type
-qualifier is a 5-bit number can be additionally used to select a
-specific bus request type.
-The default is 0.
-.Pp
-The
-.Dq Li edge
-qualifier should not be used when counting cycles with this event.
-The exact behaviour of this event depends on the processor revision.
-.It Li p4-ioq-allocation Xo
-.Op Li ,mask= Ns Ar qualifier
-.Op Li ,busreqtype= Ns Ar req-type
-.Xc
-.Pq "TS event"
-Count various types of transactions on the bus matching the flags set
-in
-.Ar qualifier
-and
-.Ar req-type .
-.Pp
-Qualifier
-.Ar qualifier
-is a
-.Ql +
-separated set of the following flags:
-.Pp
-.Bl -tag -width indent -compact
-.It Li all-read
-Count read entries.
-.It Li all-write
-Count write entries.
-.It Li mem-uc
-Count entries accessing uncacheable memory.
-.It Li mem-wc
-Count entries accessing write-combining memory.
-.It Li mem-wt
-Count entries accessing write-through memory.
-.It Li mem-wp
-Count entries accessing write-protected memory
-.It Li mem-wb
-Count entries accessing write-back memory.
-.It Li own
-Count store requests driven by the processor (i.e., not by other
-processors or by DMA).
-.It Li other
-Count store requests driven by other processors or by DMA.
-.It Li prefetch
-Include hardware and software prefetch requests in the count.
-.El
-.Pp
-The default value for
-.Ar qualifier
-is to enable all the above flags.
-.Pp
-The
-.Ar req-type
-qualifier is a 5-bit number can be additionally used to select a
-specific bus request type.
-The default is 0.
-.Pp
-The
-.Dq Li edge
-qualifier is normally used with this event to prevent multiple
-counting.
-The exact behaviour of this event depends on the processor revision.
-.It Li p4-itlb-reference Op mask= Ns Ar qualifier
-.Pq "TS event"
-Count translations using the intruction translation look-aside
-buffer.
-The
-.Ar qualifier
-argument is a list of the following strings separated by
-.Ql +
-characters.
-.Pp
-.Bl -tag -width indent -compact
-.It Li hit
-Count ITLB hits.
-.It Li miss
-Count ITLB misses.
-.It Li hit-uc
-Count uncacheable ITLB hits.
-.El
-.Pp
-If no
-.Ar qualifier
-is specified the default is to count all the three kinds of ITLB
-translations.
-.It Li p4-load-port-replay Op Li ,mask= Ns Ar qualifier
-.Pq "TS event"
-Count replayed events at the load port.
-Qualifier
-.Ar qualifier
-can take on one value:
-.Pp
-.Bl -tag -width indent -compact
-.It Li split-ld
-Count split loads.
-.El
-.Pp
-The default value for
-.Ar qualifier
-is
-.Dq Li split-ld .
-.It Li p4-mispred-branch-retired Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count mispredicted IA-32 branch instructions.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li nbogus
-Count non-bogus retired branch instructions.
-.El
-.It Li p4-machine-clear Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count the number of pipeline clears seen by the processor.
-Qualifer
-.Ar flags
-is a list of the following strings separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li clear
-Count for a portion of the many cycles when the machine is being
-cleared for any reason.
-.It Li moclear
-Count machine clears due to memory ordering issues.
-.It Li smclear
-Count machine clears due to self-modifying code.
-.El
-.Pp
-Use qualifier
-.Dq Li edge
-to get a count of occurrences of machine clears.
-The default qualifier is
-.Dq Li clear .
-.It Li p4-memory-cancel Op Li ,mask= Ns Ar event-list
-.Pq "TS event"
-Count the cancelling of various kinds of requests in the data cache
-address control unit of the CPU.
-The qualifier
-.Ar event-list
-is a list of the following strings separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li st-rb-full
-Requests cancelled because no store request buffer was available.
-.It Li 64k-conf
-Requests that conflict due to 64K aliasing.
-.El
-.Pp
-If
-.Ar event-list
-is not specified, then the default is to count both kinds of events.
-.It Li p4-memory-complete Op Li ,mask= Ns Ar event-list
-.Pq "TS event"
-Count the completion of load split, store split, uncacheable split and
-uncacheable load operations selected by qualifier
-.Ar event-list .
-The qualifier
-.Ar event-list
-is a
-.Ql +
-separated list of the following flags:
-.Pp
-.Bl -tag -width indent -compact
-.It Li lsc
-Count load splits completed, excluding loads from uncacheable or
-write-combining areas.
-.It Li ssc
-Count any split stores completed.
-.El
-.Pp
-The default is to count both kinds of operations.
-.It Li p4-mob-load-replay Op Li ,mask= Ns Ar qualifier
-.Pq "TS event"
-Count load replays triggered by the memory order buffer.
-Qualifier
-.Ar qualifier
-can be a
-.Ql +
-separated list of the following flags:
-.Pp
-.Bl -tag -width indent -compact
-.It Li no-sta
-Count replays because of unknown store addresses.
-.It Li no-std
-Count replays because of unknown store data.
-.It Li partial-data
-Count replays because of partially overlapped data accesses between
-load and store operations.
-.It Li unalgn-addr
-Count replays because of mismatches in the lower 4 bits of load and
-store operations.
-.El
-.Pp
-The default qualifier is
-.Ar no-sta+no-std+partial-data+unalgn-addr .
-.It Li p4-packed-dp-uop Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count packed double-precision uops.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li all
-Count all uops operating on packed double-precision operands.
-.El
-.It Li p4-packed-sp-uop Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count packed single-precision uops.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li all
-Count all uops operating on packed single-precision operands.
-.El
-.It Li p4-page-walk-type Op Li ,mask= Ns Ar qualifier
-.Pq "TI event"
-Count page walks performed by the page miss handler.
-Qualifier
-.Ar qualifier
-can be a
-.Ql +
-separated list of the following keywords:
-.Pp
-.Bl -tag -width indent -compact
-.It Li dtmiss
-Count page walks for data TLB misses.
-.It Li itmiss
-Count page walks for instruction TLB misses.
-.El
-.Pp
-The default value for
-.Ar qualifier
-is
-.Dq Li dtmiss+itmiss .
-.It Li p4-replay-event Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count the retirement of tagged uops selected through the replay
-tagging mechanism.
-Qualifier
-.Ar flags
-contains a
-.Ql +
-separated set of the following strings:
-.Pp
-.Bl -tag -width indent -compact
-.It Li nbogus
-The marked uops are not bogus.
-.It Li bogus
-The marked uops are bogus.
-.El
-.Pp
-This event requires additional (upstream) events to be allocated to
-perform the desired uop tagging.
-The default qualifier counts both kinds of uops.
-This event can be used for precise event based sampling.
-.It Li p4-resource-stall Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count the occurrence or latency of stalls in the allocator.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li sbfull
-A stall due to the lack of store buffers.
-.El
-.It Li p4-response
-.Pq "TI event"
-Count different types of responses.
-Further documentation on this event is not available.
-.It Li p4-retired-branch-type Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count branches retired.
-Qualifier
-.Ar flags
-contains a
-.Ql +
-separated list of strings:
-.Pp
-.Bl -tag -width indent -compact
-.It Li conditional
-Count conditional jumps.
-.It Li call
-Count direct and indirect call branches.
-.It Li return
-Count return branches.
-.It Li indirect
-Count returns, indirect calls or indirect jumps.
-.El
-.Pp
-The default qualifier counts all the above branch types.
-.It Li p4-retired-mispred-branch-type Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count mispredicted branches retired.
-Qualifier
-.Ar flags
-contains a
-.Ql +
-separated list of strings:
-.Pp
-.Bl -tag -width indent -compact
-.It Li conditional
-Count conditional jumps.
-.It Li call
-Count indirect call branches.
-.It Li return
-Count return branches.
-.It Li indirect
-Count returns, indirect calls or indirect jumps.
-.El
-.Pp
-The default qualifier counts all the above branch types.
-.It Li p4-scalar-dp-uop Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count the number of scalar double-precision uops.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li all
-Count the number of scalar double-precision uops.
-.El
-.It Li p4-scalar-sp-uop Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count the number of scalar single-precision uops.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li all
-Count all uops operating on scalar single-precision operands.
-.El
-.It Li p4-snoop
-.Pq "TI event"
-Count snoop traffic.
-Further documentation on this event is not available.
-.It Li p4-sse-input-assist Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count the number of times an assist is required to handle problems
-with the operands for SSE and SSE2 operations.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li all
-Count assists for all SSE and SSE2 uops.
-.El
-.It Li p4-store-port-replay Op Li ,mask= Ns Ar qualifier
-.Pq "TS event"
-Count events replayed at the store port.
-Qualifier
-.Ar qualifier
-can take on one value:
-.Pp
-.Bl -tag -width indent -compact
-.It Li split-st
-Count split stores.
-.El
-.Pp
-The default value for
-.Ar qualifier
-is
-.Dq Li split-st .
-.It Li p4-tc-deliver-mode Op Li ,mask= Ns Ar qualifier
-.Pq "TI event"
-Count the duration in cycles of operating modes of the trace cache and
-decode engine.
-The desired operating mode is selected by
-.Ar qualifier ,
-which is a list of the following strings separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li DD
-Both logical processors are in deliver mode.
-.It Li DB
-Logical processor 0 is in deliver mode while logical processor 1 is in
-build mode.
-.It Li DI
-Logical processor 0 is in deliver mode while logical processor 1 is
-halted, or in machine clear, or transitioning to a long microcode
-flow.
-.It Li BD
-Logical processor 0 is in build mode while logical processor 1 is in
-deliver mode.
-.It Li BB
-Both logical processors are in build mode.
-.It Li BI
-Logical processor 0 is in build mode while logical processor 1 is
-halted, or in machine clear or transitioning to a long microcode
-flow.
-.It Li ID
-Logical processor 0 is halted, or in machine clear or transitioning to
-a long microcode flow while logical processor 1 is in deliver mode.
-.It Li IB
-Logical processor 0 is halted, or in machine clear or transitioning to
-a long microcode flow while logical processor 1 is in build mode.
-.El
-.Pp
-If there is only one logical processor in the processor package then
-the qualifier for logical processor 1 is ignored.
-If no qualifier is specified, the default qualifier is
-.Dq Li DD+DB+DI+BD+BB+BI+ID+IB .
-.It Li p4-tc-ms-xfer Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count the number of times uop delivery changed from the trace cache to
-MS ROM.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li cisc
-Count TC to MS transfers.
-.El
-.It Li p4-uop-queue-writes Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count the number of valid uops written to the uop queue.
-Qualifier
-.Ar flags
-is a list of the following strings, separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li from-tc-build
-Count uops being written from the trace cache in build mode.
-.It Li from-tc-deliver
-Count uops being written from the trace cache in deliver mode.
-.It Li from-rom
-Count uops being written from microcode ROM.
-.El
-.Pp
-The default qualifier counts all the above kinds of uops.
-.It Li p4-uop-type Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-This event is used in conjunction with the front-end at-retirement
-mechanism to tag load and store uops.
-Qualifer
-.Ar flags
-comprises the following strings separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li tagloads
-Mark uops that are load operations.
-.It Li tagstores
-Mark uops that are store operations.
-.El
-.Pp
-The default qualifier counts both kinds of uops.
-.It Li p4-uops-retired Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count uops retired during a clock cycle.
-Qualifier
-.Ar flags
-comprises the following strings separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li nbogus
-Count marked uops that are not bogus.
-.It Li bogus
-Count marked uops that are bogus.
-.El
-.Pp
-The default qualifier counts both kinds of uops.
-.It Li p4-wc-buffer Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count write-combining buffer operations.
-Qualifier
-.Ar flags
-contains the following strings separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li wcb-evicts
-WC buffer evictions due to any cause.
-.It Li wcb-full-evict
-WC buffer evictions due to no WC buffer being available.
-.El
-.Pp
-The default qualifer counts both kinds of evictions.
-.It Li p4-x87-assist Op Li ,mask= Ns Ar flags
-.Pq "TS event"
-Count the retirement of x87 instructions that required special
-handling.
-Qualifier
-.Ar flags
-contains the following strings separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li fpsu
-Count instructions that saw an FP stack underflow.
-.It Li fpso
-Count instructions that saw an FP stack overflow.
-.It Li poao
-Count instructions that saw an x87 output overflow.
-.It Li poau
-Count instructions that saw an x87 output underflow.
-.It Li prea
-Count instructions that needed an x87 input assist.
-.El
-.Pp
-The default qualifier counts all the above types of instruction
-retirements.
-.It Li p4-x87-fp-uop Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count x87 floating-point uops.
-Qualifier
-.Ar flags
-can take the following value (which is also the default):
-.Pp
-.Bl -tag -width indent -compact
-.It Li all
-Count all x87 floating-point uops.
-.El
-.Pp
-If an instruction contains more than one x87 floating-point uops, then
-all x87 floating-point uops will be counted.
-This event does not count x87 floating-point data movement operations.
-.It Li p4-x87-simd-moves-uop Op Li ,mask= Ns Ar flags
-.Pq "TI event"
-Count each x87 FPU, MMX, SSE, or SSE2 uops that load data or store
-data or perform register-to-register moves.
-This event does not count integer move uops.
-Qualifier
-.Ar flags
-may contain the following keywords separated by
-.Ql +
-characters:
-.Pp
-.Bl -tag -width indent -compact
-.It Li allp0
-Count all x87 and SIMD store and move uops.
-.It Li allp2
-Count all x87 and SIMD load uops.
-.El
.Pp
-The default is to count all uops.
-.Pq Errata
-This event may be affected by processor errata N43.
+.Ss PMC Architecture Dependent Events
+PMC architecture dependent event specifiers are described in their own
+individual manual pages:
+.Bl -column " PMC_CLASS_TSC " "MANUAL PAGE "
+.It Em "PMC Class" Ta Em "Manual Page"
+.It Li PMC_CLASS_K7 Ta Xr pmc.k7 3
+.It Li PMC_CLASS_K8 Ta Xr pmc.k8 3
+.It Li PMC_CLASS_P4 Ta Xr pmc.p4 3
+.It Li PMC_CLASS_P5 Ta Xr pmc.p5 3
+.It Li PMC_CLASS_P6 Ta Xr pmc.p6 3
+.It Li PMC_CLASS_TSC Ta Xr pmc.tsc 3
.El
-.Ss "Cascading P4 PMCs"
-PMC cascading support is currently poorly implemented.
-While individual event counters may be allocated with a
-.Dq Li cascade
-qualifier, the current API does not offer the ability
-to name and allocate all the resources needed for a
-cascaded event counter pair in a single operation.
-.Ss "Precise Event Based Sampling"
-Support for precise event based sampling is currently
-unimplemented.
.Sh COMPATIBILITY
The interface between the
.Nm pmc
@@ -3542,6 +461,12 @@ The
API is
.Ud
.Sh SEE ALSO
+.Xr pmc.k7 3 ,
+.Xr pmc.k8 3 ,
+.Xr pmc.p4 3 ,
+.Xr pmc.p5 3 ,
+.Xr pmc.p6 3 ,
+.Xr pmc.tsc 3 ,
.Xr pmclog 3 ,
.Xr hwpmc 4 ,
.Xr pmccontrol 8 ,