diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/AMDGPUUsage.rst | 20 | ||||
-rw-r--r-- | docs/GetElementPtr.rst | 20 | ||||
-rw-r--r-- | docs/GoldPlugin.rst | 20 | ||||
-rw-r--r-- | docs/LangRef.rst | 34 | ||||
-rw-r--r-- | docs/Proposals/VectorizationPlan.rst | 2 | ||||
-rw-r--r-- | docs/XRay.rst | 2 |
6 files changed, 62 insertions, 36 deletions
diff --git a/docs/AMDGPUUsage.rst b/docs/AMDGPUUsage.rst index caa697ca28cdf..57822ae9ab0a9 100644 --- a/docs/AMDGPUUsage.rst +++ b/docs/AMDGPUUsage.rst @@ -587,7 +587,7 @@ Code Object Metadata The code object metadata is specified by the ``NT_AMD_AMDHSA_METADATA`` note record (see :ref:`amdgpu-note-records`). -The metadata is specified as a YAML formated string (see [YAML]_ and +The metadata is specified as a YAML formatted string (see [YAML]_ and :doc:`YamlIO`). The metadata is represented as a single YAML document comprised of the mapping @@ -1031,11 +1031,11 @@ Global variable appropriate section according to if it has initialized data or is readonly. If the symbol is external then its section is ``STN_UNDEF`` and the loader - will resolve relocations using the defintion provided by another code object + will resolve relocations using the definition provided by another code object or explicitly defined by the runtime. All global symbols, whether defined in the compilation unit or external, are - accessed by the machine code indirectly throught a GOT table entry. This + accessed by the machine code indirectly through a GOT table entry. This allows them to be preemptable. The GOT table is only supported when the target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). @@ -1160,7 +1160,7 @@ Register Mapping Define DWARF register enumeration. If want to present a wavefront state then should expose vector registers as - 64 wide (rather than per work-item view that LLVM uses). Either as seperate + 64 wide (rather than per work-item view that LLVM uses). Either as separate registers, or a 64x4 byte single register. In either case use a new LANE op (akin to XDREF) to select the current lane usage in a location expression. This would also allow scalar register spilling to vector register @@ -1653,7 +1653,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment. ``COMPUTE_PGM_RSRC2.USER_SGPR``. 6 1 bit enable_trap_handler Set to 1 if code contains a TRAP instruction which - requires a trap hander to + requires a trap handler to be enabled. CP sets @@ -2146,7 +2146,7 @@ This section describes the mapping of LLVM memory model onto AMDGPU machine code .. TODO Update when implementation complete. - Support more relaxed OpenCL memory model to be controled by environment + Support more relaxed OpenCL memory model to be controlled by environment component of target triple. The AMDGPU backend supports the memory synchronization scopes specified in @@ -2201,7 +2201,7 @@ For GFX6-GFX9: can be reordered relative to each other, which can result in reordering the visibility of vector memory operations with respect to LDS operations of other wavefronts in the same work-group. A ``s_waitcnt lgkmcnt(0)`` is required to - ensure synchonization between LDS operations and vector memory operations + ensure synchronization between LDS operations and vector memory operations between waves of a work-group, but not between operations performed by the same wavefront. * The vector memory operations are performed as wavefront wide operations and @@ -2226,7 +2226,7 @@ For GFX6-GFX9: scalar memory operations performed by waves executing in different work-groups (which may be executing on different CUs) of an agent can be reordered relative to each other. A ``s_waitcnt vmcnt(0)`` is required to ensure - synchonization between vector memory operations of different CUs. It ensures a + synchronization between vector memory operations of different CUs. It ensures a previous vector memory operation has completed before executing a subsequent vector memory or LDS operation and so can be used to meet the requirements of acquire and release. @@ -2268,7 +2268,7 @@ and vector L1 caches are invalidated between kernel dispatches by CP since constant address space data may change between kernel dispatch executions. See :ref:`amdgpu-amdhsa-memory-spaces`. -The one exeception is if scalar writes are used to spill SGPR registers. In this +The one execption is if scalar writes are used to spill SGPR registers. In this case the AMDGPU backend ensures the memory location used to spill is never accessed by vector memory operations at the same time. If scalar writes are used then a ``s_dcache_wb`` is inserted before the ``s_endpgm`` and before a function @@ -3310,7 +3310,7 @@ table be moved before the acquire. - If a fence then same as load atomic, plus no preceding associated fence-paired-atomic can be moved after the fence. - release - If a store atomic/atomicrmw then no preceeding load/load + release - If a store atomic/atomicrmw then no preceding load/load atomic/store/ store atomic/atomicrmw/fence instruction can be moved after the release. - If a fence then same as store atomic, plus no following diff --git a/docs/GetElementPtr.rst b/docs/GetElementPtr.rst index d13479dabca81..b593871695fac 100644 --- a/docs/GetElementPtr.rst +++ b/docs/GetElementPtr.rst @@ -27,7 +27,7 @@ questions. What is the first index of the GEP instruction? ----------------------------------------------- -Quick answer: The index stepping through the first operand. +Quick answer: The index stepping through the second operand. The confusion with the first index usually arises from thinking about the GetElementPtr instruction as if it was a C index operator. They aren't the @@ -59,7 +59,7 @@ Sometimes this question gets rephrased as: won't be dereferenced?* The answer is simply because memory does not have to be accessed to perform the -computation. The first operand to the GEP instruction must be a value of a +computation. The second operand to the GEP instruction must be a value of a pointer type. The value of the pointer is provided directly to the GEP instruction as an operand without any need for accessing memory. It must, therefore be indexed and requires an index operand. Consider this example: @@ -80,8 +80,8 @@ therefore be indexed and requires an index operand. Consider this example: In this "C" example, the front end compiler (Clang) will generate three GEP instructions for the three indices through "P" in the assignment statement. The -function argument ``P`` will be the first operand of each of these GEP -instructions. The second operand indexes through that pointer. The third +function argument ``P`` will be the second operand of each of these GEP +instructions. The third operand indexes through that pointer. The fourth operand will be the field offset into the ``struct munger_struct`` type, for either the ``f1`` or ``f2`` field. So, in LLVM assembly the ``munge`` function looks like: @@ -100,8 +100,8 @@ looks like: ret void } -In each case the first operand is the pointer through which the GEP instruction -starts. The same is true whether the first operand is an argument, allocated +In each case the second operand is the pointer through which the GEP instruction +starts. The same is true whether the second operand is an argument, allocated memory, or a global variable. To make this clear, let's consider a more obtuse example: @@ -159,11 +159,11 @@ confusion: i32 }*``. That is, ``%MyStruct`` is a pointer to a structure containing a pointer to a ``float`` and an ``i32``. -#. Point #1 is evidenced by noticing the type of the first operand of the GEP +#. Point #1 is evidenced by noticing the type of the second operand of the GEP instruction (``%MyStruct``) which is ``{ float*, i32 }*``. #. The first index, ``i64 0`` is required to step over the global variable - ``%MyStruct``. Since the first argument to the GEP instruction must always + ``%MyStruct``. Since the second argument to the GEP instruction must always be a value of pointer type, the first index steps through that pointer. A value of 0 means 0 elements offset from that pointer. @@ -267,7 +267,7 @@ in the IR. In the future, it will probably be outright disallowed. What effect do address spaces have on GEPs? ------------------------------------------- -None, except that the address space qualifier on the first operand pointer type +None, except that the address space qualifier on the second operand pointer type always matches the address space qualifier on the result type. How is GEP different from ``ptrtoint``, arithmetic, and ``inttoptr``? @@ -526,7 +526,7 @@ instruction: #. The GEP instruction never accesses memory, it only provides pointer computations. -#. The first operand to the GEP instruction is always a pointer and it must be +#. The second operand to the GEP instruction is always a pointer and it must be indexed. #. There are no superfluous indices for the GEP instruction. diff --git a/docs/GoldPlugin.rst b/docs/GoldPlugin.rst index 88b944a2a0fdd..78d38ccb32bd1 100644 --- a/docs/GoldPlugin.rst +++ b/docs/GoldPlugin.rst @@ -7,7 +7,7 @@ Introduction Building with link time optimization requires cooperation from the system linker. LTO support on Linux systems requires that you use the -`gold linker`_ which supports LTO via plugins. This is the same mechanism +`gold linker`_ or ld.bfd from binutils >= 2.21.51.0.2, as they support LTO via plugins. This is the same mechanism used by the `GCC LTO`_ project. The LLVM gold plugin implements the gold plugin interface on top of @@ -23,24 +23,22 @@ The LLVM gold plugin implements the gold plugin interface on top of How to build it =============== -You need to have gold with plugin support and build the LLVMgold plugin. -Check whether you have gold running ``/usr/bin/ld -v``. It will report "GNU -gold" or else "GNU ld" if not. If you have gold, check for plugin support -by running ``/usr/bin/ld -plugin``. If it complains "missing argument" then -you have plugin support. If not, such as an "unknown option" error then you -will either need to build gold or install a version with plugin support. +Check for plugin support by running ``/usr/bin/ld -plugin``. If it complains +"missing argument" then you have plugin support. If not, such as an "unknown option" +error then you will either need to build gold or install a recent version +of ld.bfd with plugin support and then build gold plugin. -* Download, configure and build gold with plugin support: +* Download, configure and build ld.bfd with plugin support: .. code-block:: bash $ git clone --depth 1 git://sourceware.org/git/binutils-gdb.git binutils $ mkdir build $ cd build - $ ../binutils/configure --enable-gold --enable-plugins --disable-werror - $ make all-gold + $ ../binutils/configure --disable-werror # ld.bfd includes plugin support by default + $ make all-ld - That should leave you with ``build/gold/ld-new`` which supports + That should leave you with ``build/ld/ld-new`` which supports the ``-plugin`` option. Running ``make`` will additionally build ``build/binutils/ar`` and ``nm-new`` binaries supporting plugins. diff --git a/docs/LangRef.rst b/docs/LangRef.rst index 68aa500150ae3..2a0812ab930fb 100644 --- a/docs/LangRef.rst +++ b/docs/LangRef.rst @@ -1468,6 +1468,19 @@ example: This attribute by itself does not imply restrictions on inter-procedural optimizations. All of the semantic effects the patching may have to be separately conveyed via the linkage type. +``"probe-stack"`` + This attribute indicates that the function will trigger a guard region + in the end of the stack. It ensures that accesses to the stack must be + no further apart than the size of the guard region to a previous + access of the stack. It takes one required string value, the name of + the stack probing function that will be called. + + If a function that has a ``"probe-stack"`` attribute is inlined into + a function with another ``"probe-stack"`` attribute, the resulting + function has the ``"probe-stack"`` attribute of the caller. If a + function that has a ``"probe-stack"`` attribute is inlined into a + function that has no ``"probe-stack"`` attribute at all, the resulting + function has the ``"probe-stack"`` attribute of the callee. ``readnone`` On a function, this attribute indicates that the function computes its result (or decides to unwind an exception) based strictly on its arguments, @@ -1498,6 +1511,21 @@ example: On an argument, this attribute indicates that the function does not write through this pointer argument, even though it may write to the memory that the pointer points to. +``"stack-probe-size"`` + This attribute controls the behavior of stack probes: either + the ``"probe-stack"`` attribute, or ABI-required stack probes, if any. + It defines the size of the guard region. It ensures that if the function + may use more stack space than the size of the guard region, stack probing + sequence will be emitted. It takes one required integer value, which + is 4096 by default. + + If a function that has a ``"stack-probe-size"`` attribute is inlined into + a function with another ``"stack-probe-size"`` attribute, the resulting + function has the ``"stack-probe-size"`` attribute that has the lower + numeric value. If a function that has a ``"stack-probe-size"`` attribute is + inlined into a function that has no ``"stack-probe-size"`` attribute + at all, the resulting function has the ``"stack-probe-size"`` attribute + of the callee. ``writeonly`` On a function, this attribute indicates that the function may write to but does not read from memory. @@ -1989,7 +2017,7 @@ A pointer value is *based* on another pointer value according to the following rules: - A pointer value formed from a ``getelementptr`` operation is *based* - on the first value operand of the ``getelementptr``. + on the second value operand of the ``getelementptr``. - The result value of a ``bitcast`` is *based* on the operand of the ``bitcast``. - A pointer value formed by an ``inttoptr`` is *based* on all pointer @@ -3166,7 +3194,7 @@ The following is the syntax for constant expressions: ``getelementptr (TY, CSTPTR, IDX0, IDX1, ...)``, ``getelementptr inbounds (TY, CSTPTR, IDX0, IDX1, ...)`` Perform the :ref:`getelementptr operation <i_getelementptr>` on constants. As with the :ref:`getelementptr <i_getelementptr>` - instruction, the index list may have zero or more indexes, which are + instruction, the index list may have one or more indexes, which are required to make sense for the type of "pointer to TY". ``select (COND, VAL1, VAL2)`` Perform the :ref:`select operation <i_select>` on constants. @@ -7805,7 +7833,7 @@ base address to start from. The remaining arguments are indices that indicate which of the elements of the aggregate object are indexed. The interpretation of each index is dependent on the type being indexed into. The first index always indexes the pointer value given as the -first argument, the second index indexes a value of the type pointed to +second argument, the second index indexes a value of the type pointed to (not necessarily the value directly pointed to, since the first index can be non-zero), etc. The first type indexed into must be a pointer value, subsequent types can be arrays, vectors, and structs. Note that diff --git a/docs/Proposals/VectorizationPlan.rst b/docs/Proposals/VectorizationPlan.rst index 82ce4b2de17af..aed8e3d2b7935 100644 --- a/docs/Proposals/VectorizationPlan.rst +++ b/docs/Proposals/VectorizationPlan.rst @@ -27,7 +27,7 @@ Vectorization Workflow VPlan-based vectorization involves three major steps, taking a "scenario-based approach" to vectorization planning: -1. Legal Step: check if a loop can be legally vectorized; encode contraints and +1. Legal Step: check if a loop can be legally vectorized; encode constraints and artifacts if so. 2. Plan Step: diff --git a/docs/XRay.rst b/docs/XRay.rst index d650319e99220..e43f78e5ffe57 100644 --- a/docs/XRay.rst +++ b/docs/XRay.rst @@ -150,7 +150,7 @@ variable, where we list down the options and their defaults below. | xray_logfile_base | ``const char*`` | ``xray-log.`` | Filename base for the | | | | | XRay logfile. | +-------------------+-----------------+---------------+------------------------+ -| xray_fdr_log | ``bool`` | ``false`` | Wheter to install the | +| xray_fdr_log | ``bool`` | ``false`` | Whether to install the | | | | | Flight Data Recorder | | | | | (FDR) mode. | +-------------------+-----------------+---------------+------------------------+ |