6 files changed, 62 insertions, 36 deletions
diff --git a/docs/AMDGPUUsage.rst b/docs/AMDGPUUsage.rst
index caa697ca28cdf..57822ae9ab0a9 100644
--- a/docs/AMDGPUUsage.rst
+++ b/docs/AMDGPUUsage.rst
@@ -587,7 +587,7 @@ Code Object Metadata
 The code object metadata is specified by the ``NT_AMD_AMDHSA_METADATA`` note
 record (see :ref:`amdgpu-note-records`).
 
-The metadata is specified as a YAML formated string (see [YAML]_ and
+The metadata is specified as a YAML formatted string (see [YAML]_ and
 :doc:`YamlIO`).
 
 The metadata is represented as a single YAML document comprised of the mapping
@@ -1031,11 +1031,11 @@ Global variable
   appropriate section according to if it has initialized data or is readonly.
 
   If the symbol is external then its section is ``STN_UNDEF`` and the loader
-  will resolve relocations using the defintion provided by another code object
+  will resolve relocations using the definition provided by another code object
   or explicitly defined by the runtime.
 
   All global symbols, whether defined in the compilation unit or external, are
-  accessed by the machine code indirectly throught a GOT table entry. This
+  accessed by the machine code indirectly through a GOT table entry. This
   allows them to be preemptable. The GOT table is only supported when the target
   triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`).
 
@@ -1160,7 +1160,7 @@ Register Mapping
    Define DWARF register enumeration.
 
    If want to present a wavefront state then should expose vector registers as
-   64 wide (rather than per work-item view that LLVM uses). Either as seperate
+   64 wide (rather than per work-item view that LLVM uses). Either as separate
    registers, or a 64x4 byte single register. In either case use a new LANE op
    (akin to XDREF) to select the current lane usage in a location
    expression. This would also allow scalar register spilling to vector register
@@ -1653,7 +1653,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                      ``COMPUTE_PGM_RSRC2.USER_SGPR``.
      6       1 bit   enable_trap_handler             Set to 1 if code contains a
                                                      TRAP instruction which
-                                                     requires a trap hander to
+                                                     requires a trap handler to
                                                      be enabled.
 
                                                      CP sets
@@ -2146,7 +2146,7 @@ This section describes the mapping of LLVM memory model onto AMDGPU machine code
 .. TODO
    Update when implementation complete.
 
-   Support more relaxed OpenCL memory model to be controled by environment
+   Support more relaxed OpenCL memory model to be controlled by environment
    component of target triple.
 
 The AMDGPU backend supports the memory synchronization scopes specified in
@@ -2201,7 +2201,7 @@ For GFX6-GFX9:
   can be reordered relative to each other, which can result in reordering the
   visibility of vector memory operations with respect to LDS operations of other
   wavefronts in the same work-group. A ``s_waitcnt lgkmcnt(0)`` is required to
-  ensure synchonization between LDS operations and vector memory operations
+  ensure synchronization between LDS operations and vector memory operations
   between waves of a work-group, but not between operations performed by the
   same wavefront.
 * The vector memory operations are performed as wavefront wide operations and
@@ -2226,7 +2226,7 @@ For GFX6-GFX9:
   scalar memory operations performed by waves executing in different work-groups
   (which may be executing on different CUs) of an agent can be reordered
   relative to each other. A ``s_waitcnt vmcnt(0)`` is required to ensure
-  synchonization between vector memory operations of different CUs. It ensures a
+  synchronization between vector memory operations of different CUs. It ensures a
   previous vector memory operation has completed before executing a subsequent
   vector memory or LDS operation and so can be used to meet the requirements of
   acquire and release.
@@ -2268,7 +2268,7 @@ and vector L1 caches are invalidated between kernel dispatches by CP since
 constant address space data may change between kernel dispatch executions. See
 :ref:`amdgpu-amdhsa-memory-spaces`.
 
-The one exeception is if scalar writes are used to spill SGPR registers. In this
+The one execption is if scalar writes are used to spill SGPR registers. In this
 case the AMDGPU backend ensures the memory location used to spill is never
 accessed by vector memory operations at the same time. If scalar writes are used
 then a ``s_dcache_wb`` is inserted before the ``s_endpgm`` and before a function
@@ -3310,7 +3310,7 @@ table
                     be moved before the acquire.
                   - If a fence then same as load atomic, plus no preceding
                     associated fence-paired-atomic can be moved after the fence.
-     release      - If a store atomic/atomicrmw then no preceeding load/load
+     release      - If a store atomic/atomicrmw then no preceding load/load
                     atomic/store/ store atomic/atomicrmw/fence instruction can
                     be moved after the release.
                   - If a fence then same as store atomic, plus no following
diff --git a/docs/GetElementPtr.rst b/docs/GetElementPtr.rst
index d13479dabca81..b593871695fac 100644
--- a/docs/GetElementPtr.rst
+++ b/docs/GetElementPtr.rst
@@ -27,7 +27,7 @@ questions.
 What is the first index of the GEP instruction?
 -----------------------------------------------
 
-Quick answer: The index stepping through the first operand.
+Quick answer: The index stepping through the second operand.
 
 The confusion with the first index usually arises from thinking about the
 GetElementPtr instruction as if it was a C index operator. They aren't the
@@ -59,7 +59,7 @@ Sometimes this question gets rephrased as:
   won't be dereferenced?*
 
 The answer is simply because memory does not have to be accessed to perform the
-computation. The first operand to the GEP instruction must be a value of a
+computation. The second operand to the GEP instruction must be a value of a
 pointer type. The value of the pointer is provided directly to the GEP
 instruction as an operand without any need for accessing memory. It must,
 therefore be indexed and requires an index operand. Consider this example:
@@ -80,8 +80,8 @@ therefore be indexed and requires an index operand. Consider this example:
 
 In this "C" example, the front end compiler (Clang) will generate three GEP
 instructions for the three indices through "P" in the assignment statement.  The
-function argument ``P`` will be the first operand of each of these GEP
-instructions.  The second operand indexes through that pointer.  The third
+function argument ``P`` will be the second operand of each of these GEP
+instructions.  The third operand indexes through that pointer.  The fourth
 operand will be the field offset into the ``struct munger_struct`` type, for
 either the ``f1`` or ``f2`` field. So, in LLVM assembly the ``munge`` function
 looks like:
@@ -100,8 +100,8 @@ looks like:
     ret void
   }
 
-In each case the first operand is the pointer through which the GEP instruction
-starts. The same is true whether the first operand is an argument, allocated
+In each case the second operand is the pointer through which the GEP instruction
+starts. The same is true whether the second operand is an argument, allocated
 memory, or a global variable.
 
 To make this clear, let's consider a more obtuse example:
@@ -159,11 +159,11 @@ confusion:
    i32 }*``. That is, ``%MyStruct`` is a pointer to a structure containing a
    pointer to a ``float`` and an ``i32``.
 
-#. Point #1 is evidenced by noticing the type of the first operand of the GEP
+#. Point #1 is evidenced by noticing the type of the second operand of the GEP
    instruction (``%MyStruct``) which is ``{ float*, i32 }*``.
 
 #. The first index, ``i64 0`` is required to step over the global variable
-   ``%MyStruct``.  Since the first argument to the GEP instruction must always
+   ``%MyStruct``.  Since the second argument to the GEP instruction must always
    be a value of pointer type, the first index steps through that pointer. A
    value of 0 means 0 elements offset from that pointer.
 
@@ -267,7 +267,7 @@ in the IR. In the future, it will probably be outright disallowed.
 What effect do address spaces have on GEPs?
 -------------------------------------------
 
-None, except that the address space qualifier on the first operand pointer type
+None, except that the address space qualifier on the second operand pointer type
 always matches the address space qualifier on the result type.
 
 How is GEP different from ``ptrtoint``, arithmetic, and ``inttoptr``?
@@ -526,7 +526,7 @@ instruction:
 #. The GEP instruction never accesses memory, it only provides pointer
    computations.
 
-#. The first operand to the GEP instruction is always a pointer and it must be
+#. The second operand to the GEP instruction is always a pointer and it must be
    indexed.
 
 #. There are no superfluous indices for the GEP instruction.
diff --git a/docs/GoldPlugin.rst b/docs/GoldPlugin.rst
index 88b944a2a0fdd..78d38ccb32bd1 100644
--- a/docs/GoldPlugin.rst
+++ b/docs/GoldPlugin.rst
@@ -7,7 +7,7 @@ Introduction
 
 Building with link time optimization requires cooperation from
 the system linker. LTO support on Linux systems requires that you use the
-`gold linker`_ which supports LTO via plugins. This is the same mechanism
+`gold linker`_ or ld.bfd from binutils >= 2.21.51.0.2, as they support LTO via plugins. This is the same mechanism
 used by the `GCC LTO`_ project.
 
 The LLVM gold plugin implements the gold plugin interface on top of
@@ -23,24 +23,22 @@ The LLVM gold plugin implements the gold plugin interface on top of
 How to build it
 ===============
 
-You need to have gold with plugin support and build the LLVMgold plugin.
-Check whether you have gold running ``/usr/bin/ld -v``. It will report "GNU
-gold" or else "GNU ld" if not. If you have gold, check for plugin support
-by running ``/usr/bin/ld -plugin``. If it complains "missing argument" then
-you have plugin support. If not, such as an "unknown option" error then you
-will either need to build gold or install a version with plugin support.
+Check for plugin support by running ``/usr/bin/ld -plugin``. If it complains
+"missing argument" then you have plugin support. If not, such as an "unknown option"
+error then you will either need to build gold or install a recent version
+of ld.bfd with plugin support and then build gold plugin.
 
-* Download, configure and build gold with plugin support:
+* Download, configure and build ld.bfd with plugin support:
 
   .. code-block:: bash
 
      $ git clone --depth 1 git://sourceware.org/git/binutils-gdb.git binutils
      $ mkdir build
      $ cd build
-     $ ../binutils/configure --enable-gold --enable-plugins --disable-werror
-     $ make all-gold
+     $ ../binutils/configure --disable-werror # ld.bfd includes plugin support by default
+     $ make all-ld
 
-  That should leave you with ``build/gold/ld-new`` which supports
+  That should leave you with ``build/ld/ld-new`` which supports
   the ``-plugin`` option. Running ``make`` will additionally build
   ``build/binutils/ar`` and ``nm-new`` binaries supporting plugins.
 
diff --git a/docs/LangRef.rst b/docs/LangRef.rst
index 68aa500150ae3..2a0812ab930fb 100644
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@@ -1468,6 +1468,19 @@ example:
     This attribute by itself does not imply restrictions on
     inter-procedural optimizations.  All of the semantic effects the
     patching may have to be separately conveyed via the linkage type.
+``"probe-stack"``
+    This attribute indicates that the function will trigger a guard region
+    in the end of the stack. It ensures that accesses to the stack must be
+    no further apart than the size of the guard region to a previous
+    access of the stack. It takes one required string value, the name of
+    the stack probing function that will be called.
+
+    If a function that has a ``"probe-stack"`` attribute is inlined into
+    a function with another ``"probe-stack"`` attribute, the resulting
+    function has the ``"probe-stack"`` attribute of the caller. If a
+    function that has a ``"probe-stack"`` attribute is inlined into a
+    function that has no ``"probe-stack"`` attribute at all, the resulting
+    function has the ``"probe-stack"`` attribute of the callee.
 ``readnone``
     On a function, this attribute indicates that the function computes its
     result (or decides to unwind an exception) based strictly on its arguments,
@@ -1498,6 +1511,21 @@ example:
     On an argument, this attribute indicates that the function does not write
     through this pointer argument, even though it may write to the memory that
     the pointer points to.
+``"stack-probe-size"``
+    This attribute controls the behavior of stack probes: either
+    the ``"probe-stack"`` attribute, or ABI-required stack probes, if any.
+    It defines the size of the guard region. It ensures that if the function
+    may use more stack space than the size of the guard region, stack probing
+    sequence will be emitted. It takes one required integer value, which
+    is 4096 by default.
+
+    If a function that has a ``"stack-probe-size"`` attribute is inlined into
+    a function with another ``"stack-probe-size"`` attribute, the resulting
+    function has the ``"stack-probe-size"`` attribute that has the lower
+    numeric value. If a function that has a ``"stack-probe-size"`` attribute is
+    inlined into a function that has no ``"stack-probe-size"`` attribute
+    at all, the resulting function has the ``"stack-probe-size"`` attribute
+    of the callee.
 ``writeonly``
     On a function, this attribute indicates that the function may write to but
     does not read from memory.
@@ -1989,7 +2017,7 @@ A pointer value is *based* on another pointer value according to the
 following rules:
 
 -  A pointer value formed from a ``getelementptr`` operation is *based*
-   on the first value operand of the ``getelementptr``.
+   on the second value operand of the ``getelementptr``.
 -  The result value of a ``bitcast`` is *based* on the operand of the
    ``bitcast``.
 -  A pointer value formed by an ``inttoptr`` is *based* on all pointer
@@ -3166,7 +3194,7 @@ The following is the syntax for constant expressions:
 ``getelementptr (TY, CSTPTR, IDX0, IDX1, ...)``, ``getelementptr inbounds (TY, CSTPTR, IDX0, IDX1, ...)``
     Perform the :ref:`getelementptr operation <i_getelementptr>` on
     constants. As with the :ref:`getelementptr <i_getelementptr>`
-    instruction, the index list may have zero or more indexes, which are
+    instruction, the index list may have one or more indexes, which are
     required to make sense for the type of "pointer to TY".
 ``select (COND, VAL1, VAL2)``
     Perform the :ref:`select operation <i_select>` on constants.
@@ -7805,7 +7833,7 @@ base address to start from. The remaining arguments are indices
 that indicate which of the elements of the aggregate object are indexed.
 The interpretation of each index is dependent on the type being indexed
 into. The first index always indexes the pointer value given as the
-first argument, the second index indexes a value of the type pointed to
+second argument, the second index indexes a value of the type pointed to
 (not necessarily the value directly pointed to, since the first index
 can be non-zero), etc. The first type indexed into must be a pointer
 value, subsequent types can be arrays, vectors, and structs. Note that
diff --git a/docs/Proposals/VectorizationPlan.rst b/docs/Proposals/VectorizationPlan.rst
index 82ce4b2de17af..aed8e3d2b7935 100644
--- a/docs/Proposals/VectorizationPlan.rst
+++ b/docs/Proposals/VectorizationPlan.rst
@@ -27,7 +27,7 @@ Vectorization Workflow
 VPlan-based vectorization involves three major steps, taking a "scenario-based
 approach" to vectorization planning:
 
-1. Legal Step: check if a loop can be legally vectorized; encode contraints and
+1. Legal Step: check if a loop can be legally vectorized; encode constraints and
    artifacts if so.
 2. Plan Step:
 
diff --git a/docs/XRay.rst b/docs/XRay.rst
index d650319e99220..e43f78e5ffe57 100644
--- a/docs/XRay.rst
+++ b/docs/XRay.rst
@@ -150,7 +150,7 @@ variable, where we list down the options and their defaults below.
 | xray_logfile_base | ``const char*`` | ``xray-log.`` | Filename base for the  |
 |                   |                 |               | XRay logfile.          |
 +-------------------+-----------------+---------------+------------------------+
-| xray_fdr_log      | ``bool``        | ``false``     | Wheter to install the  |
+| xray_fdr_log      | ``bool``        | ``false``     | Whether to install the  |
 |                   |                 |               | Flight Data Recorder   |
 |                   |                 |               | (FDR) mode.            |
 +-------------------+-----------------+---------------+------------------------+