summaryrefslogtreecommitdiff
path: root/doc/howto_libipt.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/howto_libipt.md')
-rw-r--r--doc/howto_libipt.md1271
1 files changed, 1271 insertions, 0 deletions
diff --git a/doc/howto_libipt.md b/doc/howto_libipt.md
new file mode 100644
index 000000000000..3d3c12f0bb16
--- /dev/null
+++ b/doc/howto_libipt.md
@@ -0,0 +1,1271 @@
+Decoding Intel(R) Processor Trace Using libipt {#libipt}
+========================================================
+
+<!---
+ ! Copyright (c) 2013-2019, Intel Corporation
+ !
+ ! Redistribution and use in source and binary forms, with or without
+ ! modification, are permitted provided that the following conditions are met:
+ !
+ ! * Redistributions of source code must retain the above copyright notice,
+ ! this list of conditions and the following disclaimer.
+ ! * Redistributions in binary form must reproduce the above copyright notice,
+ ! this list of conditions and the following disclaimer in the documentation
+ ! and/or other materials provided with the distribution.
+ ! * Neither the name of Intel Corporation nor the names of its contributors
+ ! may be used to endorse or promote products derived from this software
+ ! without specific prior written permission.
+ !
+ ! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ ! AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ ! IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ ! ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ ! LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ ! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ ! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ ! INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ ! CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ ! ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ ! POSSIBILITY OF SUCH DAMAGE.
+ !-->
+
+This chapter describes how to use libipt for various tasks around Intel
+Processor Trace (Intel PT). For code examples, refer to the sample tools that
+are contained in the source tree:
+
+ * *ptdump* A packet dumper example.
+ * *ptxed* A control-flow reconstruction example.
+ * *pttc* A packet encoder example.
+
+
+For detailed information about Intel PT, please refer to the respective chapter
+in Volume 3 of the Intel Software Developer's Manual at
+http://www.intel.com/sdm.
+
+
+## Introduction
+
+The libipt decoder library provides multiple layers of abstraction ranging from
+packet encoding and decoding to full execution flow reconstruction. The layers
+are organized as follows:
+
+ * *packets* This layer deals with raw Intel PT packets.
+
+ * *events* This layer deals with packet combinations that
+ encode higher-level events.
+
+ * *instruction flow* This layer deals with the execution flow on the
+ instruction level.
+
+ * *block* This layer deals with the execution flow on the
+ instruction level.
+
+ It is faster than the instruction flow decoder but
+ requires a small amount of post-processing.
+
+
+Each layer provides its own encoder or decoder struct plus a set of functions
+for allocating and freeing encoder or decoder objects and for synchronizing
+decoders onto the Intel PT packet stream. Function names are prefixed with
+`pt_<lyr>_` where `<lyr>` is an abbreviation of the layer name. The following
+abbreviations are used:
+
+ * *enc* Packet encoding (packet layer).
+ * *pkt* Packet decoding (packet layer).
+ * *qry* Event (or query) layer.
+ * *insn* Instruction flow layer.
+ * *blk* Block layer.
+
+
+Here is some generic example code for working with decoders:
+
+~~~{.c}
+ struct pt_<layer>_decoder *decoder;
+ struct pt_config config;
+ int errcode;
+
+ memset(&config, 0, sizeof(config));
+ config.size = sizeof(config);
+ config.begin = <pt buffer begin>;
+ config.end = <pt buffer end>;
+ config.cpu = <cpu identifier>;
+ config...
+
+ decoder = pt_<lyr>_alloc_decoder(&config);
+ if (!decoder)
+ <handle error>(errcode);
+
+ errcode = pt_<lyr>_sync_<where>(decoder);
+ if (errcode < 0)
+ <handle error>(errcode);
+
+ <use decoder>(decoder);
+
+ pt_<lyr>_free_decoder(decoder);
+~~~
+
+First, configure the decoder. As a minimum, the size of the config struct and
+the `begin` and `end` of the buffer containing the Intel PT data need to be set.
+Configuration options details will be discussed later in this chapter. In the
+case of packet encoding, this is the begin and end address of the pre-allocated
+buffer, into which Intel PT packets shall be written.
+
+Next, allocate a decoder object for the layer you are interested in. A return
+value of NULL indicates an error. There is no further information available on
+the exact error condition. Most of the time, however, the error is the result
+of an incomplete or inconsistent configuration.
+
+Before the decoder can be used, it needs to be synchronized onto the Intel PT
+packet stream specified in the configuration. The only exception to this is the
+packet encoder, which is implicitly synchronized onto the beginning of the Intel
+PT buffer.
+
+Depending on the type of decoder, one or more synchronization options are
+available.
+
+ * `pt_<lyr>_sync_forward()` Synchronize onto the next PSB in forward
+ direction (or the first PSB if not yet
+ synchronized).
+
+ * `pt_<lyr>_sync_backward()` Synchronize onto the next PSB in backward
+ direction (or the last PSB if not yet
+ synchronized).
+
+ * `pt_<lyr>_sync_set()` Set the synchronization position to a
+ user-defined location in the Intel PT packet
+ stream.
+ There is no check whether the specified
+ location makes sense or is valid.
+
+
+After synchronizing, the decoder can be used. While decoding, the decoder
+stores the location of the last PSB it encountered during normal decode.
+Subsequent calls to pt_<lyr>_sync_forward() will start searching from that
+location. This is useful for re-synchronizing onto the Intel PT packet stream
+in case of errors. An example of a typical decode loop is given below:
+
+~~~{.c}
+ for (;;) {
+ int errcode;
+
+ errcode = <use decoder>(decoder);
+ if (errcode >= 0)
+ continue;
+
+ if (errcode == -pte_eos)
+ return;
+
+ <report error>(errcode);
+
+ do {
+ errcode = pt_<lyr>_sync_forward(decoder);
+
+ if (errcode == -pte_eos)
+ return;
+ } while (errcode < 0);
+ }
+~~~
+
+You can get the current decoder position as offset into the Intel PT buffer via:
+
+ pt_<lyr>_get_offset()
+
+
+You can get the position of the last synchronization point as offset into the
+Intel PT buffer via:
+
+ pt_<lyr>_get_sync_offset()
+
+
+Each layer will be discussed in detail below. In the remainder of this section,
+general functionality will be considered.
+
+
+### Version
+
+You can query the library version using:
+
+ * `pt_library_version()`
+
+
+This function returns a version structure that can be used for compatibility
+checks or simply for reporting the version of the decoder library.
+
+
+### Errors
+
+The library uses a single error enum for all layers.
+
+ * `enum pt_error_code` An enumeration of encode and decode errors.
+
+
+Errors are typically represented as negative pt_error_code enumeration constants
+and returned as an int. The library provides two functions for dealing with
+errors:
+
+ * `pt_errcode()` Translate an int return value into a pt_error_code
+ enumeration constant.
+
+ * `pt_errstr()` Returns a human-readable error string.
+
+
+Not all errors may occur on every layer. Every API function specifies the
+errors it may return.
+
+
+### Configuration
+
+Every encoder or decoder allocation function requires a configuration argument.
+Some of its fields have already been discussed in the example above. Refer to
+the `intel-pt.h` header for detailed and up-to-date documentation of each field.
+
+As a minimum, the `size` field needs to be set to `sizeof(struct pt_config)` and
+`begin` and `end` need to be set to the Intel PT buffer to use.
+
+The size is used for detecting library version mismatches and to provide
+backwards compatibility. Without the proper `size`, decoder allocation will
+fail.
+
+Although not strictly required, it is recommended to also set the `cpu` field to
+the processor, on which Intel PT has been collected (for decoders), or for which
+Intel PT shall be generated (for encoders). This allows implementing
+processor-specific behavior such as erratum workarounds.
+
+
+## The Packet Layer
+
+This layer deals with Intel PT packet encoding and decoding. It can further be
+split into three sub-layers: opcodes, encoding, and decoding.
+
+
+### Opcodes
+
+The opcodes layer provides enumerations for all the bits necessary for Intel PT
+encoding and decoding. The enumeration constants can be used without linking to
+the decoder library. There is no encoder or decoder struct associated with this
+layer. See the intel-pt.h header file for details.
+
+
+### Packet Encoding
+
+The packet encoding layer provides support for encoding Intel PT
+packet-by-packet. Start by configuring and allocating a `pt_packet_encoder` as
+shown below:
+
+~~~{.c}
+ struct pt_encoder *encoder;
+ struct pt_config config;
+ int errcode;
+
+ memset(&config, 0, sizeof(config));
+ config.size = sizeof(config);
+ config.begin = <pt buffer begin>;
+ config.end = <pt buffer end>;
+ config.cpu = <cpu identifier>;
+
+ encoder = pt_alloc_encoder(&config);
+ if (!encoder)
+ <handle error>(errcode);
+~~~
+
+For packet encoding, only the mandatory config fields need to be filled in.
+
+The allocated encoder object will be implicitly synchronized onto the beginning
+of the Intel PT buffer. You may change the encoder's position at any time by
+calling `pt_enc_sync_set()` with the desired buffer offset.
+
+Next, fill in a `pt_packet` object with details about the packet to be encoded.
+You do not need to fill in the `size` field. The needed size is computed by the
+encoder. There is no consistency check with the size specified in the packet
+object. The following example encodes a TIP packet:
+
+~~~{.c}
+ struct pt_packet_encoder *encoder = ...;
+ struct pt_packet packet;
+ int errcode;
+
+ packet.type = ppt_tip;
+ packet.payload.ip.ipc = pt_ipc_update_16;
+ packet.payload.ip.ip = <ip>;
+~~~
+
+For IP packets, for example FUP or TIP.PGE, there is no need to mask out bits in
+the `ip` field that will not be encoded in the packet due to the specified IP
+compression in the `ipc` field. The encoder will ignore them.
+
+There are no consistency checks whether the specified IP compression in the
+`ipc` field is allowed in the current context or whether decode will result in
+the full IP specified in the `ip` field.
+
+Once the packet object has been filled, it can be handed over to the encoder as
+shown here:
+
+~~~{.c}
+ errcode = pt_enc_next(encoder, &packet);
+ if (errcode < 0)
+ <handle error>(errcode);
+~~~
+
+The encoder will encode the packet, write it into the Intel PT buffer, and
+advance its position to the next byte after the packet. On a successful encode,
+it will return the number of bytes that have been written. In case of errors,
+nothing will be written and the encoder returns a negative error code.
+
+
+### Packet Decoding
+
+The packet decoding layer provides support for decoding Intel PT
+packet-by-packet. Start by configuring and allocating a `pt_packet_decoder` as
+shown here:
+
+~~~{.c}
+ struct pt_packet_decoder *decoder;
+ struct pt_config config;
+ int errcode;
+
+ memset(&config, 0, sizeof(config));
+ config.size = sizeof(config);
+ config.begin = <pt buffer begin>;
+ config.end = <pt buffer end>;
+ config.cpu = <cpu identifier>;
+ config.decode.callback = <decode function>;
+ config.decode.context = <decode context>;
+
+ decoder = pt_pkt_alloc_decoder(&config);
+ if (!decoder)
+ <handle error>(errcode);
+~~~
+
+For packet decoding, an optional decode callback function may be specified in
+addition to the mandatory config fields. If specified, the callback function
+will be called for packets the decoder does not know about. If there is no
+decode callback specified, the decoder will return `-pte_bad_opc`. In addition
+to the callback function pointer, an optional pointer to user-defined context
+information can be specified. This context will be passed to the decode
+callback function.
+
+Before the decoder can be used, it needs to be synchronized onto the Intel PT
+packet stream. Packet decoders offer three synchronization functions. To
+iterate over synchronization points in the Intel PT packet stream in forward or
+backward direction, use one of the following two functions respectively:
+
+ pt_pkt_sync_forward()
+ pt_pkt_sync_backward()
+
+
+To manually synchronize the decoder at a particular offset into the Intel PT
+packet stream, use the following function:
+
+ pt_pkt_sync_set()
+
+
+There are no checks to ensure that the specified offset is at the beginning of a
+packet. The example below shows synchronization to the first synchronization
+point:
+
+~~~{.c}
+ struct pt_packet_decoder *decoder;
+ int errcode;
+
+ errcode = pt_pkt_sync_forward(decoder);
+ if (errcode < 0)
+ <handle error>(errcode);
+~~~
+
+The decoder will remember the last synchronization packet it decoded.
+Subsequent calls to `pt_pkt_sync_forward` and `pt_pkt_sync_backward` will use
+this as their starting point.
+
+You can get the current decoder position as offset into the Intel PT buffer via:
+
+ pt_pkt_get_offset()
+
+
+You can get the position of the last synchronization point as offset into the
+Intel PT buffer via:
+
+ pt_pkt_get_sync_offset()
+
+
+Once the decoder is synchronized, you can iterate over packets by repeated calls
+to `pt_pkt_next()` as shown in the following example:
+
+~~~{.c}
+ struct pt_packet_decoder *decoder;
+ int errcode;
+
+ for (;;) {
+ struct pt_packet packet;
+
+ errcode = pt_pkt_next(decoder, &packet, sizeof(packet));
+ if (errcode < 0)
+ break;
+
+ <process packet>(&packet);
+ }
+~~~
+
+
+## The Event Layer
+
+The event layer deals with packet combinations that encode higher-level events.
+It is used for reconstructing execution flow for users who need finer-grain
+control not available via the instruction flow layer or for users who want to
+integrate execution flow reconstruction with other functionality more tightly
+than it would be possible otherwise.
+
+This section describes how to use the query decoder for reconstructing execution
+flow. See the instruction flow decoder as an example. Start by configuring and
+allocating a `pt_query_decoder` as shown below:
+
+~~~{.c}
+ struct pt_query_decoder *decoder;
+ struct pt_config config;
+ int errcode;
+
+ memset(&config, 0, sizeof(config));
+ config.size = sizeof(config);
+ config.begin = <pt buffer begin>;
+ config.end = <pt buffer end>;
+ config.cpu = <cpu identifier>;
+ config.decode.callback = <decode function>;
+ config.decode.context = <decode context>;
+
+ decoder = pt_qry_alloc_decoder(&config);
+ if (!decoder)
+ <handle error>(errcode);
+~~~
+
+An optional packet decode callback function may be specified in addition to the
+mandatory config fields. If specified, the callback function will be called for
+packets the decoder does not know about. The query decoder will ignore the
+unknown packet except for its size in order to skip it. If there is no decode
+callback specified, the decoder will abort with `-pte_bad_opc`. In addition to
+the callback function pointer, an optional pointer to user-defined context
+information can be specified. This context will be passed to the decode
+callback function.
+
+Before the decoder can be used, it needs to be synchronized onto the Intel PT
+packet stream. To iterate over synchronization points in the Intel PT packet
+stream in forward or backward direction, the query decoders offer the following
+two synchronization functions respectively:
+
+ pt_qry_sync_forward()
+ pt_qry_sync_backward()
+
+
+To manually synchronize the decoder at a synchronization point (i.e. PSB packet)
+in the Intel PT packet stream, use the following function:
+
+ pt_qry_sync_set()
+
+
+After successfully synchronizing, the query decoder will start reading the PSB+
+header to initialize its internal state. If tracing is enabled at this
+synchronization point, the IP of the instruction, at which decoding should be
+started, is returned. If tracing is disabled at this synchronization point, it
+will be indicated in the returned status bits (see below). In this example,
+synchronization to the first synchronization point is shown:
+
+~~~{.c}
+ struct pt_query_decoder *decoder;
+ uint64_t ip;
+ int status;
+
+ status = pt_qry_sync_forward(decoder, &ip);
+ if (status < 0)
+ <handle error>(status);
+~~~
+
+In addition to a query decoder, you will need an instruction decoder for
+decoding and classifying instructions.
+
+
+#### In A Nutshell
+
+After synchronizing, you begin decoding instructions starting at the returned
+IP. As long as you can determine the next instruction in execution order, you
+continue on your own. Only when the next instruction cannot be determined by
+examining the current instruction, you would ask the query decoder for guidance:
+
+ * If the current instruction is a conditional branch, the
+ `pt_qry_cond_branch()` function will tell whether it was taken.
+
+ * If the current instruction is an indirect branch, the
+ `pt_qry_indirect_branch()` function will provide the IP of its destination.
+
+
+~~~{.c}
+ struct pt_query_decoder *decoder;
+ uint64_t ip;
+
+ for (;;) {
+ struct <instruction> insn;
+
+ insn = <decode instruction>(ip);
+
+ ip += <instruction size>(insn);
+
+ if (<is cond branch>(insn)) {
+ int status, taken;
+
+ status = pt_qry_cond_branch(decoder, &taken);
+ if (status < 0)
+ <handle error>(status);
+
+ if (taken)
+ ip += <branch displacement>(insn);
+ } else if (<is indirect branch>(insn)) {
+ int status;
+
+ status = pt_qry_indirect_branch(decoder, &ip);
+ if (status < 0)
+ <handle error>(status);
+ }
+ }
+~~~
+
+
+Certain aspects such as, for example, asynchronous events or synchronizing at a
+location where tracing is disabled, have been ignored so far. Let us consider
+them now.
+
+
+#### Queries
+
+The query decoder provides four query functions:
+
+ * `pt_qry_cond_branch()` Query whether the next conditional branch was
+ taken.
+
+ * `pt_qry_indirect_branch()` Query for the destination IP of the next
+ indirect branch.
+
+ * `pt_qry_event()` Query for the next event.
+
+ * `pt_qry_time()` Query for the current time.
+
+
+Each function returns either a positive vector of status bits or a negative
+error code. For details on status bits and error conditions, please refer to
+the `pt_status_flag` and `pt_error_code` enumerations in the intel-pt.h header.
+
+The `pts_ip_suppressed` status bit is used to indicate that no IP is available
+at functions that are supposed to return an IP. Examples are the indirect
+branch query function and both synchronization functions.
+
+The `pts_event_pending` status bit is used to indicate that there is an event
+pending. You should query for this event before continuing execution flow
+reconstruction.
+
+The `pts_eos` status bit is used to indicate the end of the trace. Any
+subsequent query will return -pte_eos.
+
+
+#### Events
+
+Events are signaled ahead of time. When you query for pending events as soon as
+they are indicated, you will be aware of asynchronous events before you reach
+the instruction associated with the event.
+
+For example, if tracing is disabled at the synchronization point, the IP will be
+suppressed. In this case, it is very likely that a tracing enabled event is
+signaled. You will also get events for initializing the decoder state after
+synchronizing onto the Intel PT packet stream. For example, paging or execution
+mode events.
+
+See the `enum pt_event_type` and `struct pt_event` in the intel-pt.h header for
+details on possible events. This document does not give an example of event
+processing. Refer to the implementation of the instruction flow decoder in
+pt_insn.c for details.
+
+
+#### Timing
+
+To be able to signal events, the decoder reads ahead until it arrives at a query
+relevant packet. Errors encountered during that time will be postponed until
+the respective query call. This reading ahead affects timing. The decoder will
+always be a few packets ahead. When querying for the current time, the query
+will return the time at the decoder's current packet. This corresponds to the
+time at our next query.
+
+
+#### Return Compression
+
+If Intel PT has been configured to compress returns, a successfully compressed
+return is represented as a conditional branch instead of an indirect branch.
+For a RET instruction, you first query for a conditional branch. If the query
+succeeds, it should indicate that the branch was taken. In that case, the
+return has been compressed. A not taken branch indicates an error. If the
+query fails, the return has not been compressed and you query for an indirect
+branch.
+
+There is no guarantee that returns will be compressed. Even though return
+compression has been enabled, returns may still be represented as indirect
+branches.
+
+To reconstruct the execution flow for compressed returns, you would maintain a
+stack of return addresses. For each call instruction, push the IP of the
+instruction following the call onto the stack. For compressed returns, pop the
+topmost IP from the stack. See pt_retstack.h and pt_retstack.c for a sample
+implementation.
+
+
+## The Instruction Flow Layer
+
+The instruction flow layer provides a simple API for iterating over instructions
+in execution order. Start by configuring and allocating a `pt_insn_decoder` as
+shown below:
+
+~~~{.c}
+ struct pt_insn_decoder *decoder;
+ struct pt_config config;
+ int errcode;
+
+ memset(&config, 0, sizeof(config));
+ config.size = sizeof(config);
+ config.begin = <pt buffer begin>;
+ config.end = <pt buffer end>;
+ config.cpu = <cpu identifier>;
+ config.decode.callback = <decode function>;
+ config.decode.context = <decode context>;
+
+ decoder = pt_insn_alloc_decoder(&config);
+ if (!decoder)
+ <handle error>(errcode);
+~~~
+
+An optional packet decode callback function may be specified in addition to the
+mandatory config fields. If specified, the callback function will be called for
+packets the decoder does not know about. The decoder will ignore the unknown
+packet except for its size in order to skip it. If there is no decode callback
+specified, the decoder will abort with `-pte_bad_opc`. In addition to the
+callback function pointer, an optional pointer to user-defined context
+information can be specified. This context will be passed to the decode
+callback function.
+
+The image argument is optional. If no image is given, the decoder will use an
+empty default image that can be populated later on and that is implicitly
+destroyed when the decoder is freed. See below for more information on this.
+
+
+#### The Traced Image
+
+In addition to the Intel PT configuration, the instruction flow decoder needs to
+know the memory image for which Intel PT has been recorded. This memory image
+is represented by a `pt_image` object. If decoding failed due to an IP lying
+outside of the traced memory image, `pt_insn_next()` will return `-pte_nomap`.
+
+Use `pt_image_alloc()` to allocate and `pt_image_free()` to free an image.
+Images may not be shared. Every decoder must use a different image. Use this
+to prepare the image in advance or if you want to switch between images.
+
+Every decoder provides an empty default image that is used if no image is
+specified during allocation. The default image is implicitly destroyed when the
+decoder is freed. It can be obtained by calling `pt_insn_get_image()`. Use
+this if you only use one decoder and one image.
+
+An image is a collection of contiguous, non-overlapping memory regions called
+`sections`. Starting with an empty image, it may be populated with repeated
+calls to `pt_image_add_file()` or `pt_image_add_cached()`, one for each section,
+or with a call to `pt_image_copy()` to add all sections from another image. If
+a newly added section overlaps with an existing section, the existing section
+will be truncated or split to make room for the new section.
+
+In some cases, the memory image may change during the execution. You can use
+the `pt_image_remove_by_filename()` function to remove previously added sections
+by their file name and `pt_image_remove_by_asid()` to remove all sections for an
+address-space.
+
+In addition to adding sections, you can register a callback function for reading
+memory using `pt_image_set_callback()`. The `context` parameter you pass
+together with the callback function pointer will be passed to your callback
+function every time it is called. There can only be one callback at any time.
+Adding a new callback will remove any previously added callback. To remove the
+callback function, pass `NULL` to `pt_image_set_callback()`.
+
+Callback and files may be combined. The callback function is used whenever
+the memory cannot be found in any of the image's sections.
+
+If more than one process is traced, the memory image may change when the process
+context is switched. To simplify handling this case, an address-space
+identifier may be passed to each of the above functions to define separate
+images for different processes at the same time. The decoder will select the
+correct image based on context switch information in the Intel PT trace. If
+you want to manage this on your own, you can use `pt_insn_set_image()` to
+replace the image a decoder uses.
+
+
+#### The Traced Image Section Cache
+
+When using multiple decoders that work on related memory images it is desirable
+to share image sections between decoders. The underlying file sections will be
+mapped only once per image section cache.
+
+Use `pt_iscache_alloc()` to allocate and `pt_iscache_free()` to free an image
+section cache. Freeing the cache does not destroy sections added to the cache.
+They remain valid until they are no longer used.
+
+Use `pt_iscache_add_file()` to add a file section to an image section cache.
+The function returns an image section identifier (ISID) that uniquely identifies
+the section in this cache. Use `pt_image_add_cached()` to add a file section
+from an image section cache to an image.
+
+Multiple image section caches may be used at the same time but it is recommended
+not to mix sections from different image section caches in one image.
+
+A traced image section cache can also be used for reading an instruction's
+memory via its IP and ISID as provided in `struct pt_insn`.
+
+The image section cache provides a cache of recently mapped sections and keeps
+them mapped when they are unmapped by the images that used them. This avoid
+repeated unmapping and re-mapping of image sections in some parallel debug
+scenarios or when reading memory from the image section cache.
+
+Use `pt_iscache_set_limit()` to set the limit of this cache in bytes. This
+accounts for the extra memory that will be used for keeping image sections
+mapped including any block caches associated with image sections. To disable
+caching, set the limit to zero.
+
+
+#### Synchronizing
+
+Before the decoder can be used, it needs to be synchronized onto the Intel PT
+packet stream. To iterate over synchronization points in the Intel PT packet
+stream in forward or backward directions, the instruction flow decoders offer
+the following two synchronization functions respectively:
+
+ pt_insn_sync_forward()
+ pt_insn_sync_backward()
+
+
+To manually synchronize the decoder at a synchronization point (i.e. PSB packet)
+in the Intel PT packet stream, use the following function:
+
+ pt_insn_sync_set()
+
+
+The example below shows synchronization to the first synchronization point:
+
+~~~{.c}
+ struct pt_insn_decoder *decoder;
+ int errcode;
+
+ errcode = pt_insn_sync_forward(decoder);
+ if (errcode < 0)
+ <handle error>(errcode);
+~~~
+
+The decoder will remember the last synchronization packet it decoded.
+Subsequent calls to `pt_insn_sync_forward` and `pt_insn_sync_backward` will use
+this as their starting point.
+
+You can get the current decoder position as offset into the Intel PT buffer via:
+
+ pt_insn_get_offset()
+
+
+You can get the position of the last synchronization point as offset into the
+Intel PT buffer via:
+
+ pt_insn_get_sync_offset()
+
+
+#### Iterating
+
+Once the decoder is synchronized, you can iterate over instructions in execution
+flow order by repeated calls to `pt_insn_next()` as shown in the following
+example:
+
+~~~{.c}
+ struct pt_insn_decoder *decoder;
+ int status;
+
+ for (;;) {
+ struct pt_insn insn;
+
+ status = pt_insn_next(decoder, &insn, sizeof(insn));
+
+ if (insn.iclass != ptic_error)
+ <process instruction>(&insn);
+
+ if (status < 0)
+ break;
+
+ ...
+ }
+~~~
+
+Note that the example ignores non-error status returns.
+
+For each instruction, you get its IP, its size in bytes, the raw memory, an
+identifier for the image section that contained it, the current execution mode,
+and the speculation state, that is whether the instruction has been executed
+speculatively. In addition, you get a coarse classification that can be used
+for further processing without the need for a full instruction decode.
+
+If a traced image section cache is used the image section identifier can be used
+to trace an instruction back to the binary file that contained it. This allows
+mapping the instruction back to source code using the debug information
+contained in or reachable via the binary file.
+
+Beware that `pt_insn_next()` may indicate errors that occur after the returned
+instruction. The returned instruction is valid if its `iclass` field is set.
+
+
+#### Events
+
+The instruction flow decoder uses an event system similar to the query
+decoder's. Pending events are indicated by the `pts_event_pending` flag in the
+status flag bit-vector returned from `pt_insn_sync_<where>()`, `pt_insn_next()`
+and `pt_insn_event()`.
+
+When the `pts_event_pending` flag is set on return from `pt_insn_next()`, use
+repeated calls to `pt_insn_event()` to drain all queued events. Then switch
+back to calling `pt_insn_next()` to resume with instruction flow decode as
+shown in the following example:
+
+~~~{.c}
+ struct pt_insn_decoder *decoder;
+ int status;
+
+ for (;;) {
+ struct pt_insn insn;
+
+ status = pt_insn_next(decoder, &insn, sizeof(insn));
+ if (status < 0)
+ break;
+
+ <process instruction>(&insn);
+
+ while (status & pts_event_pending) {
+ struct pt_event event;
+
+ status = pt_insn_event(decoder, &event, sizeof(event));
+ if (status < 0)
+ <handle error>(status);
+
+ <process event>(&event);
+ }
+ }
+~~~
+
+
+#### The Instruction Flow Decode Loop
+
+If we put all of the above examples together, we end up with a decode loop as
+shown below:
+
+~~~{.c}
+ int handle_events(struct pt_insn_decoder *decoder, int status)
+ {
+ while (status & pts_event_pending) {
+ struct pt_event event;
+
+ status = pt_insn_event(decoder, &event, sizeof(event));
+ if (status < 0)
+ break;
+
+ <process event>(&event);
+ }
+
+ return status;
+ }
+
+ int decode(struct pt_insn_decoder *decoder)
+ {
+ int status;
+
+ for (;;) {
+ status = pt_insn_sync_forward(decoder);
+ if (status < 0)
+ break;
+
+ for (;;) {
+ struct pt_insn insn;
+
+ status = handle_events(decoder, status);
+ if (status < 0)
+ break;
+
+ status = pt_insn_next(decoder, &insn, sizeof(insn));
+
+ if (insn.iclass != ptic_error)
+ <process instruction>(&insn);
+
+ if (status < 0)
+ break;
+ }
+
+ <handle error>(status);
+ }
+
+ <handle error>(status);
+
+ return status;
+ }
+~~~
+
+
+## The Block Layer
+
+The block layer provides a simple API for iterating over blocks of sequential
+instructions in execution order. The instructions in a block are sequential in
+the sense that no trace is required for reconstructing the instructions. The IP
+of the first instruction is given in `struct pt_block` and the IP of other
+instructions in the block can be determined by decoding and examining the
+previous instruction.
+
+Start by configuring and allocating a `pt_block_decoder` as shown below:
+
+~~~{.c}
+ struct pt_block_decoder *decoder;
+ struct pt_config config;
+
+ memset(&config, 0, sizeof(config));
+ config.size = sizeof(config);
+ config.begin = <pt buffer begin>;
+ config.end = <pt buffer end>;
+ config.cpu = <cpu identifier>;
+ config.decode.callback = <decode function>;
+ config.decode.context = <decode context>;
+
+ decoder = pt_blk_alloc_decoder(&config);
+~~~
+
+An optional packet decode callback function may be specified in addition to the
+mandatory config fields. If specified, the callback function will be called for
+packets the decoder does not know about. The decoder will ignore the unknown
+packet except for its size in order to skip it. If there is no decode callback
+specified, the decoder will abort with `-pte_bad_opc`. In addition to the
+callback function pointer, an optional pointer to user-defined context
+information can be specified. This context will be passed to the decode
+callback function.
+
+
+#### Synchronizing
+
+Before the decoder can be used, it needs to be synchronized onto the Intel PT
+packet stream. To iterate over synchronization points in the Intel PT packet
+stream in forward or backward directions, the block decoder offers the following
+two synchronization functions respectively:
+
+ pt_blk_sync_forward()
+ pt_blk_sync_backward()
+
+
+To manually synchronize the decoder at a synchronization point (i.e. PSB packet)
+in the Intel PT packet stream, use the following function:
+
+ pt_blk_sync_set()
+
+
+The example below shows synchronization to the first synchronization point:
+
+~~~{.c}
+ struct pt_block_decoder *decoder;
+ int errcode;
+
+ errcode = pt_blk_sync_forward(decoder);
+ if (errcode < 0)
+ <handle error>(errcode);
+~~~
+
+The decoder will remember the last synchronization packet it decoded.
+Subsequent calls to `pt_blk_sync_forward` and `pt_blk_sync_backward` will use
+this as their starting point.
+
+You can get the current decoder position as offset into the Intel PT buffer via:
+
+ pt_blk_get_offset()
+
+
+You can get the position of the last synchronization point as offset into the
+Intel PT buffer via:
+
+ pt_blk_get_sync_offset()
+
+
+#### Iterating
+
+Once the decoder is synchronized, it can be used to iterate over blocks of
+instructions in execution flow order by repeated calls to `pt_blk_next()` as
+shown in the following example:
+
+~~~{.c}
+ struct pt_block_decoder *decoder;
+ int status;
+
+ for (;;) {
+ struct pt_block block;
+
+ status = pt_blk_next(decoder, &block, sizeof(block));
+
+ if (block.ninsn > 0)
+ <process block>(&block);
+
+ if (status < 0)
+ break;
+
+ ...
+ }
+~~~
+
+Note that the example ignores non-error status returns.
+
+A block contains enough information to reconstruct the instructions. See
+`struct pt_block` in `intel-pt.h` for details. Note that errors returned by
+`pt_blk_next()` apply after the last instruction in the provided block.
+
+It is recommended to use a traced image section cache so the image section
+identifier contained in a block can be used for reading the memory containing
+the instructions in the block. This also allows mapping the instructions back
+to source code using the debug information contained in or reachable via the
+binary file.
+
+In some cases, the last instruction in a block may cross image section
+boundaries. This can happen when a code segment is split into more than one
+image section. The block is marked truncated in this case and provides the raw
+bytes of the last instruction.
+
+The following example shows how instructions can be reconstructed from a block:
+
+~~~{.c}
+ struct pt_image_section_cache *iscache;
+ struct pt_block *block;
+ uint16_t ninsn;
+ uint64_t ip;
+
+ ip = block->ip;
+ for (ninsn = 0; ninsn < block->ninsn; ++ninsn) {
+ uint8_t raw[pt_max_insn_size];
+ <struct insn> insn;
+ int size;
+
+ if (block->truncated && ((ninsn +1) == block->ninsn)) {
+ memcpy(raw, block->raw, block->size);
+ size = block->size;
+ } else {
+ size = pt_iscache_read(iscache, raw, sizeof(raw), block->isid, ip);
+ if (size < 0)
+ break;
+ }
+
+ errcode = <decode instruction>(&insn, raw, size, block->mode);
+ if (errcode < 0)
+ break;
+
+ <process instruction>(&insn);
+
+ ip = <determine next ip>(&insn);
+ }
+~~~
+
+
+#### Events
+
+The block decoder uses an event system similar to the query decoder's. Pending
+events are indicated by the `pts_event_pending` flag in the status flag
+bit-vector returned from `pt_blk_sync_<where>()`, `pt_blk_next()` and
+`pt_blk_event()`.
+
+When the `pts_event_pending` flag is set on return from `pt_blk_sync_<where>()`
+or `pt_blk_next()`, use repeated calls to `pt_blk_event()` to drain all queued
+events. Then switch back to calling `pt_blk_next()` to resume with block decode
+as shown in the following example:
+
+~~~{.c}
+ struct pt_block_decoder *decoder;
+ int status;
+
+ for (;;) {
+ struct pt_block block;
+
+ status = pt_blk_next(decoder, &block, sizeof(block));
+ if (status < 0)
+ break;
+
+ <process block>(&block);
+
+ while (status & pts_event_pending) {
+ struct pt_event event;
+
+ status = pt_blk_event(decoder, &event, sizeof(event));
+ if (status < 0)
+ <handle error>(status);
+
+ <process event>(&event);
+ }
+ }
+~~~
+
+
+#### The Block Decode Loop
+
+If we put all of the above examples together, we end up with a decode loop as
+shown below:
+
+~~~{.c}
+ int process_block(struct pt_block *block,
+ struct pt_image_section_cache *iscache)
+ {
+ uint16_t ninsn;
+ uint64_t ip;
+
+ ip = block->ip;
+ for (ninsn = 0; ninsn < block->ninsn; ++ninsn) {
+ struct pt_insn insn;
+
+ memset(&insn, 0, sizeof(insn));
+ insn->speculative = block->speculative;
+ insn->isid = block->isid;
+ insn->mode = block->mode;
+ insn->ip = ip;
+
+ if (block->truncated && ((ninsn +1) == block->ninsn)) {
+ insn.truncated = 1;
+ insn.size = block->size;
+
+ memcpy(insn.raw, block->raw, insn.size);
+ } else {
+ int size;
+
+ size = pt_iscache_read(iscache, insn.raw, sizeof(insn.raw),
+ insn.isid, insn.ip);
+ if (size < 0)
+ return size;
+
+ insn.size = (uint8_t) size;
+ }
+
+ <decode instruction>(&insn);
+ <process instruction>(&insn);
+
+ ip = <determine next ip>(&insn);
+ }
+
+ return 0;
+ }
+
+ int handle_events(struct pt_blk_decoder *decoder, int status)
+ {
+ while (status & pts_event_pending) {
+ struct pt_event event;
+
+ status = pt_blk_event(decoder, &event, sizeof(event));
+ if (status < 0)
+ break;
+
+ <process event>(&event);
+ }
+
+ return status;
+ }
+
+ int decode(struct pt_blk_decoder *decoder,
+ struct pt_image_section_cache *iscache)
+ {
+ int status;
+
+ for (;;) {
+ status = pt_blk_sync_forward(decoder);
+ if (status < 0)
+ break;
+
+ for (;;) {
+ struct pt_block block;
+ int errcode;
+
+ status = handle_events(decoder, status);
+ if (status < 0)
+ break;
+
+ status = pt_blk_next(decoder, &block, sizeof(block));
+
+ errcode = process_block(&block, iscache);
+ if (errcode < 0)
+ status = errcode;
+
+ if (status < 0)
+ break;
+ }
+
+ <handle error>(status);
+ }
+
+ <handle error>(status);
+
+ return status;
+ }
+~~~
+
+
+## Parallel Decode
+
+Intel PT splits naturally into self-contained PSB segments that can be decoded
+independently. Use the packet or query decoder to search for PSB's using
+repeated calls to `pt_pkt_sync_forward()` and `pt_pkt_get_sync_offset()` (or
+`pt_qry_sync_forward()` and `pt_qry_get_sync_offset()`). The following example
+shows this using the query decoder, which will already give the IP needed in
+the next step.
+
+~~~{.c}
+ struct pt_query_decoder *decoder;
+ uint64_t offset, ip;
+ int status, errcode;
+
+ for (;;) {
+ status = pt_qry_sync_forward(decoder, &ip);
+ if (status < 0)
+ break;
+
+ errcode = pt_qry_get_sync_offset(decoder, &offset);
+ if (errcode < 0)
+ <handle error>(errcode);
+
+ <split trace>(offset, ip, status);
+ }
+~~~
+
+The individual trace segments can then be decoded using the query, instruction
+flow, or block decoder as shown above in the previous examples.
+
+When stitching decoded trace segments together, a sequence of linear (in the
+sense that it can be decoded without Intel PT) code has to be filled in. Use
+the `pts_eos` status indication to stop decoding early enough. Then proceed
+until the IP at the start of the succeeding trace segment is reached. When
+using the instruction flow decoder, `pt_insn_next()` may be used for that as
+shown in the following example:
+
+~~~{.c}
+ struct pt_insn_decoder *decoder;
+ struct pt_insn insn;
+ int status;
+
+ for (;;) {
+ status = pt_insn_next(decoder, &insn, sizeof(insn));
+ if (status < 0)
+ <handle error>(status);
+
+ if (status & pts_eos)
+ break;
+
+ <process instruction>(&insn);
+ }
+
+ while (insn.ip != <next segment's start IP>) {
+ <process instruction>(&insn);
+
+ status = pt_insn_next(decoder, &insn, sizeof(insn));
+ if (status < 0)
+ <handle error>(status);
+ }
+~~~
+
+
+## Threading
+
+The decoder library API is not thread-safe. Different threads may allocate and
+use different decoder objects at the same time. Different decoders must not use
+the same image object. Use `pt_image_copy()` to give each decoder its own copy
+of a shared master image.