63 files changed, 3907 insertions, 1876 deletions
diff --git a/docs/AliasAnalysis.rst b/docs/AliasAnalysis.rst
index f62cc3fe4d31..fe7fcbd4bc50 100644
--- a/docs/AliasAnalysis.rst
+++ b/docs/AliasAnalysis.rst
@@ -389,11 +389,10 @@ in its ``getAnalysisUsage`` that it does so. Some passes attempt to use
 ``AU.addPreserved<AliasAnalysis>``, however this doesn't actually have any
 effect.
 
-``AliasAnalysisCounter`` (``-count-aa``) and ``AliasDebugger`` (``-debug-aa``)
-are implemented as ``ModulePass`` classes, so if your alias analysis uses
-``FunctionPass``, it won't be able to use these utilities. If you try to use
-them, the pass manager will silently route alias analysis queries directly to
-``BasicAliasAnalysis`` instead.
+``AliasAnalysisCounter`` (``-count-aa``) are implemented as ``ModulePass``
+classes, so if your alias analysis uses ``FunctionPass``, it won't be able to
+use these utilities. If you try to use them, the pass manager will silently
+route alias analysis queries directly to ``BasicAliasAnalysis`` instead.
 
 Similarly, the ``opt -p`` option introduces ``ModulePass`` passes between each
 pass, which prevents the use of ``FunctionPass`` alias analysis passes.
diff --git a/docs/Atomics.rst b/docs/Atomics.rst
index 9068df46b023..79ab74792dd4 100644
--- a/docs/Atomics.rst
+++ b/docs/Atomics.rst
@@ -446,7 +446,7 @@ It is often easiest for backends to use AtomicExpandPass to lower some of the
 atomic constructs. Here are some lowerings it can do:
 
 * cmpxchg -> loop with load-linked/store-conditional
-  by overriding ``hasLoadLinkedStoreConditional()``, ``emitLoadLinked()``,
+  by overriding ``shouldExpandAtomicCmpXchgInIR()``, ``emitLoadLinked()``,
   ``emitStoreConditional()``
 * large loads/stores -> ll-sc/cmpxchg
   by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
diff --git a/docs/BitCodeFormat.rst b/docs/BitCodeFormat.rst
index 25ea421ed083..d6e3099bdb63 100644
--- a/docs/BitCodeFormat.rst
+++ b/docs/BitCodeFormat.rst
@@ -756,6 +756,7 @@ function. The operand fields are:
   * ``anyregcc``: code 13
   * ``preserve_mostcc``: code 14
   * ``preserve_allcc``: code 15
+  * ``cxx_fast_tlscc``: code 17
   * ``x86_stdcallcc``: code 64
   * ``x86_fastcallcc``: code 65
   * ``arm_apcscc``: code 66
@@ -851,7 +852,7 @@ in the *paramattr* field of module block `FUNCTION`_ records, or within the
 *attr* field of function block ``INST_INVOKE`` and ``INST_CALL`` records.
 
 Entries within ``PARAMATTR_BLOCK`` are constructed to ensure that each is unique
-(i.e., no two indicies represent equivalent attribute lists).
+(i.e., no two indices represent equivalent attribute lists).
 
 .. _PARAMATTR_CODE_ENTRY:
 
@@ -904,7 +905,7 @@ table entry, which may be referenced by 0-based index from instructions,
 constants, metadata, type symbol table entries, or other type operator records.
 
 Entries within ``TYPE_BLOCK`` are constructed to ensure that each entry is
-unique (i.e., no two indicies represent structurally equivalent types).
+unique (i.e., no two indices represent structurally equivalent types).
 
 .. _TYPE_CODE_NUMENTRY:
 .. _NUMENTRY:
diff --git a/docs/BitSets.rst b/docs/BitSets.rst
index c6ffdbdb8a11..18dbf6df563f 100644
--- a/docs/BitSets.rst
+++ b/docs/BitSets.rst
@@ -10,17 +10,41 @@ for the type of the class or its derived classes.
 
 To use the mechanism, a client creates a global metadata node named
 ``llvm.bitsets``.  Each element is a metadata node with three elements:
-the first is a metadata string containing an identifier for the bitset,
-the second is a global variable and the third is a byte offset into the
-global variable.
+
+1. a metadata object representing an identifier for the bitset
+2. either a global variable or a function
+3. a byte offset into the global (generally zero for functions)
+
+Each bitset must exclusively contain either global variables or functions.
+
+.. admonition:: Limitation
+
+  The current implementation only supports functions as members of bitsets on
+  the x86-32 and x86-64 architectures.
 
 This will cause a link-time optimization pass to generate bitsets from the
-memory addresses referenced from the elements of the bitset metadata. The pass
-will lay out the referenced globals consecutively, so their definitions must
-be available at LTO time. The `GlobalLayoutBuilder`_ class is responsible for
-laying out the globals efficiently to minimize the sizes of the underlying
-bitsets. An intrinsic, :ref:`llvm.bitset.test <bitset.test>`, generates code
-to test whether a given pointer is a member of a bitset.
+memory addresses referenced from the elements of the bitset metadata. The
+pass will lay out referenced global variables consecutively, so their
+definitions must be available at LTO time.
+
+A bit set containing functions is transformed into a jump table, which
+is a block of code consisting of one branch instruction for each of the
+functions in the bit set that branches to the target function, and redirect
+any taken function addresses to the corresponding jump table entry. In the
+object file's symbol table, the jump table entries take the identities of
+the original functions, so that addresses taken outside the module will pass
+any verification done inside the module.
+
+Jump tables may call external functions, so their definitions need not
+be available at LTO time. Note that if an externally defined function is a
+member of a bitset, there is no guarantee that its identity within the module
+will be the same as its identity outside of the module, as the former will
+be the jump table entry if a jump table is necessary.
+
+The `GlobalLayoutBuilder`_ class is responsible for laying out the globals
+efficiently to minimize the sizes of the underlying bitsets. An intrinsic,
+:ref:`llvm.bitset.test <bitset.test>`, generates code to test whether a
+given pointer is a member of a bitset.
 
 :Example:
 
@@ -33,13 +57,25 @@ to test whether a given pointer is a member of a bitset.
     @c = internal global i32 0
     @d = internal global [2 x i32] [i32 0, i32 0]
 
-    !llvm.bitsets = !{!0, !1, !2, !3, !4}
+    define void @e() {
+      ret void
+    }
+
+    define void @f() {
+      ret void
+    }
+
+    declare void @g()
+
+    !llvm.bitsets = !{!0, !1, !2, !3, !4, !5, !6}
 
     !0 = !{!"bitset1", i32* @a, i32 0}
     !1 = !{!"bitset1", i32* @b, i32 0}
     !2 = !{!"bitset2", i32* @b, i32 0}
     !3 = !{!"bitset2", i32* @c, i32 0}
     !4 = !{!"bitset2", i32* @d, i32 4}
+    !5 = !{!"bitset3", void ()* @e, i32 0}
+    !6 = !{!"bitset3", void ()* @g, i32 0}
 
     declare i1 @llvm.bitset.test(i8* %ptr, metadata %bitset) nounwind readnone
 
@@ -55,6 +91,12 @@ to test whether a given pointer is a member of a bitset.
       ret i1 %x
     }
 
+    define i1 @baz(void ()* %p) {
+      %pi8 = bitcast void ()* %p to i8*
+      %x = call i1 @llvm.bitset.test(i8* %pi8, metadata !"bitset3")
+      ret i1 %x
+    }
+
     define void @main() {
       %a1 = call i1 @foo(i32* @a) ; returns 1
       %b1 = call i1 @foo(i32* @b) ; returns 1
@@ -64,6 +106,9 @@ to test whether a given pointer is a member of a bitset.
       %c2 = call i1 @bar(i32* @c) ; returns 1
       %d02 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 0)) ; returns 0
       %d12 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 1)) ; returns 1
+      %e = call i1 @baz(void ()* @e) ; returns 1
+      %f = call i1 @baz(void ()* @f) ; returns 0
+      %g = call i1 @baz(void ()* @g) ; returns 1
       ret void
     }
 
diff --git a/docs/BranchWeightMetadata.rst b/docs/BranchWeightMetadata.rst
index 2ebc6c32416e..6cbcb0f0fb24 100644
--- a/docs/BranchWeightMetadata.rst
+++ b/docs/BranchWeightMetadata.rst
@@ -27,7 +27,7 @@ Supported Instructions
 ^^^^^^^^^^^^^^
 
 Metadata is only assigned to the conditional branches. There are two extra
-operarands for the true and the false branch.
+operands for the true and the false branch.
 
 .. code-block:: llvm
 
@@ -114,12 +114,12 @@ CFG Modifications
 
 Branch Weight Metatada is not proof against CFG changes. If terminator operands'
 are changed some action should be taken. In other case some misoptimizations may
-occur due to incorrent branch prediction information.
+occur due to incorrect branch prediction information.
 
 Function Entry Counts
 =====================
 
-To allow comparing different functions durint inter-procedural analysis and
+To allow comparing different functions during inter-procedural analysis and
 optimization, ``MD_prof`` nodes can also be assigned to a function definition.
 The first operand is a string indicating the name of the associated counter.
 
diff --git a/docs/BuildingLLVMWithAutotools.rst b/docs/BuildingLLVMWithAutotools.rst
index 6f9a13410555..083ead67ebb6 100644
--- a/docs/BuildingLLVMWithAutotools.rst
+++ b/docs/BuildingLLVMWithAutotools.rst
@@ -5,6 +5,12 @@ Building LLVM With Autotools
 .. contents::
    :local:
 
+.. warning::
+
+    Building LLVM with autoconf is deprecated as of 3.8. The autoconf build
+    system will be removed in 3.9. Please migrate to using CMake. For more
+    information see: `Building LLVM with CMake <CMake.html>`_
+
 Overview
 ========
 
diff --git a/docs/CMake.rst b/docs/CMake.rst
index 909fc04248c7..38199e5cc587 100644
--- a/docs/CMake.rst
+++ b/docs/CMake.rst
@@ -10,11 +10,11 @@ Introduction
 
 `CMake <http://www.cmake.org/>`_ is a cross-platform build-generator tool. CMake
 does not build the project, it generates the files needed by your build tool
-(GNU make, Visual Studio, etc) for building LLVM.
+(GNU make, Visual Studio, etc.) for building LLVM.
 
 If you are really anxious about getting a functional LLVM build, go to the
-`Quick start`_ section. If you are a CMake novice, start on `Basic CMake usage`_
-and then go back to the `Quick start`_ once you know what you are doing. The
+`Quick start`_ section. If you are a CMake novice, start with `Basic CMake usage`_
+and then go back to the `Quick start`_ section once you know what you are doing. The
 `Options and variables`_ section is a reference for customizing your build. If
 you already have experience with CMake, this is the recommended starting point.
 
@@ -31,35 +31,35 @@ We use here the command-line, non-interactive CMake interface.
 #. Open a shell. Your development tools must be reachable from this shell
    through the PATH environment variable.
 
-#. Create a directory for containing the build. It is not supported to build
-   LLVM on the source directory. cd to this directory:
+#. Create a build directory. Building LLVM in the source
+   directory is not supported. cd to this directory:
 
    .. code-block:: console
 
      $ mkdir mybuilddir
      $ cd mybuilddir
 
-#. Execute this command on the shell replacing `path/to/llvm/source/root` with
+#. Execute this command in the shell replacing `path/to/llvm/source/root` with
    the path to the root of your LLVM source tree:
 
    .. code-block:: console
 
      $ cmake path/to/llvm/source/root
 
-   CMake will detect your development environment, perform a series of test and
+   CMake will detect your development environment, perform a series of tests, and
    generate the files required for building LLVM. CMake will use default values
    for all build parameters. See the `Options and variables`_ section for
-   fine-tuning your build
+   a list of build parameters that you can modify.
 
    This can fail if CMake can't detect your toolset, or if it thinks that the
-   environment is not sane enough. On this case make sure that the toolset that
-   you intend to use is the only one reachable from the shell and that the shell
-   itself is the correct one for you development environment. CMake will refuse
+   environment is not sane enough. In this case, make sure that the toolset that
+   you intend to use is the only one reachable from the shell, and that the shell
+   itself is the correct one for your development environment. CMake will refuse
    to build MinGW makefiles if you have a POSIX shell reachable through the PATH
    environment variable, for instance. You can force CMake to use a given build
-   tool, see the `Usage`_ section.
+   tool; for instructions, see the `Usage`_ section, below.
 
-#. After CMake has finished running, proceed to use IDE project files or start
+#. After CMake has finished running, proceed to use IDE project files, or start
    the build from the build directory:
 
    .. code-block:: console
@@ -67,9 +67,9 @@ We use here the command-line, non-interactive CMake interface.
      $ cmake --build .
 
    The ``--build`` option tells ``cmake`` to invoke the underlying build
-   tool (``make``, ``ninja``, ``xcodebuild``, ``msbuild``, etc).
+   tool (``make``, ``ninja``, ``xcodebuild``, ``msbuild``, etc.)
 
-   The underlying build tool can be invoked directly either of course, but
+   The underlying build tool can be invoked directly, of course, but
    the ``--build`` option is portable.
 
 #. After LLVM has finished building, install it from the build directory:
@@ -95,33 +95,39 @@ We use here the command-line, non-interactive CMake interface.
 Basic CMake usage
 =================
 
-This section explains basic aspects of CMake, mostly for explaining those
-options which you may need on your day-to-day usage.
+This section explains basic aspects of CMake
+which you may need in your day-to-day usage.
 
-CMake comes with extensive documentation in the form of html files and on the
-cmake executable itself. Execute ``cmake --help`` for further help options.
+CMake comes with extensive documentation, in the form of html files, and as
+online help accessible via the ``cmake`` executable itself. Execute ``cmake
+--help`` for further help options.
 
-CMake requires to know for which build tool it shall generate files (GNU make,
-Visual Studio, Xcode, etc). If not specified on the command line, it tries to
-guess it based on you environment. Once identified the build tool, CMake uses
-the corresponding *Generator* for creating files for your build tool. You can
+CMake allows you to specify a build tool (e.g., GNU make, Visual Studio,
+or Xcode). If not specified on the command line, CMake tries to guess which
+build tool to use, based on your environment. Once it has identified your
+build tool, CMake uses the corresponding *Generator* to create files for your
+build tool (e.g., Makefiles or Visual Studio or Xcode project files). You can
 explicitly specify the generator with the command line option ``-G "Name of the
-generator"``. For knowing the available generators on your platform, execute
+generator"``. To see a list of the available generators on your system, execute
 
 .. code-block:: console
 
   $ cmake --help
 
-This will list the generator's names at the end of the help text. Generator's
-names are case-sensitive. Example:
+This will list the generator names at the end of the help text.
+
+Generators' names are case-sensitive, and may contain spaces. For this reason,
+you should enter them exactly as they are listed in the ``cmake --help``
+output, in quotes. For example, to generate project files specifically for
+Visual Studio 12, you can execute:
 
 .. code-block:: console
 
-  $ cmake -G "Visual Studio 11" path/to/llvm/source/root
+  $ cmake -G "Visual Studio 12" path/to/llvm/source/root
 
 For a given development platform there can be more than one adequate
-generator. If you use Visual Studio "NMake Makefiles" is a generator you can use
-for building with NMake. By default, CMake chooses the more specific generator
+generator. If you use Visual Studio, "NMake Makefiles" is a generator you can use
+for building with NMake. By default, CMake chooses the most specific generator
 supported by your development environment. If you want an alternative generator,
 you must tell this to CMake with the ``-G`` option.
 
@@ -142,18 +148,20 @@ CMake command line like this:
 
   $ cmake -DVARIABLE=value path/to/llvm/source
 
-You can set a variable after the initial CMake invocation for changing its
+You can set a variable after the initial CMake invocation to change its
 value. You can also undefine a variable:
 
 .. code-block:: console
 
   $ cmake -UVARIABLE path/to/llvm/source
 
-Variables are stored on the CMake cache. This is a file named ``CMakeCache.txt``
-on the root of the build directory. Do not hand-edit it.
+Variables are stored in the CMake cache. This is a file named ``CMakeCache.txt``
+stored at the root of your build directory that is generated by ``cmake``.
+Editing it yourself is not recommended.
 
-Variables are listed here appending its type after a colon. It is correct to
-write the variable and the type on the CMake command line:
+Variables are listed in the CMake cache and later in this document with
+the variable name and type separated by a colon. You can also specify the
+variable and type on the CMake command line:
 
 .. code-block:: console
 
@@ -163,17 +171,17 @@ Frequently-used CMake variables
 -------------------------------
 
 Here are some of the CMake variables that are used often, along with a
-brief explanation and LLVM-specific notes. For full documentation, check the
-CMake docs or execute ``cmake --help-variable VARIABLE_NAME``.
+brief explanation and LLVM-specific notes. For full documentation, consult the
+CMake manual, or execute ``cmake --help-variable VARIABLE_NAME``.
 
 **CMAKE_BUILD_TYPE**:STRING
-  Sets the build type for ``make`` based generators. Possible values are
-  Release, Debug, RelWithDebInfo and MinSizeRel. On systems like Visual Studio
-  the user sets the build type with the IDE settings.
+  Sets the build type for ``make``-based generators. Possible values are
+  Release, Debug, RelWithDebInfo and MinSizeRel. If you are using an IDE such as
+  Visual Studio, you should use the IDE settings to set the build type.
 
 **CMAKE_INSTALL_PREFIX**:PATH
   Path where LLVM will be installed if "make install" is invoked or the
-  "INSTALL" target is built.
+  "install" target is built.
 
 **LLVM_LIBDIR_SUFFIX**:STRING
   Extra suffix to append to the directory where libraries are to be
@@ -188,8 +196,9 @@ CMake docs or execute ``cmake --help-variable VARIABLE_NAME``.
 
 **BUILD_SHARED_LIBS**:BOOL
   Flag indicating if shared libraries will be built. Its default value is
-  OFF. Shared libraries are not supported on Windows and not recommended on the
-  other OSes.
+  OFF. This option is only recommended for use by LLVM developers.
+  On Windows, shared libraries may be used when building with MinGW, including
+  mingw-w64, but not when building with the Microsoft toolchain.
 
 .. _LLVM-specific variables:
 
@@ -203,13 +212,13 @@ LLVM-specific variables
 
 **LLVM_BUILD_TOOLS**:BOOL
   Build LLVM tools. Defaults to ON. Targets for building each tool are generated
-  in any case. You can build an tool separately by invoking its target. For
-  example, you can build *llvm-as* with a makefile-based system executing *make
-  llvm-as* on the root of your build directory.
+  in any case. You can build a tool separately by invoking its target. For
+  example, you can build *llvm-as* with a Makefile-based system by executing *make
+  llvm-as* at the root of your build directory.
 
 **LLVM_INCLUDE_TOOLS**:BOOL
-  Generate build targets for the LLVM tools. Defaults to ON. You can use that
-  option for disabling the generation of build targets for the LLVM tools.
+  Generate build targets for the LLVM tools. Defaults to ON. You can use this
+  option to disable the generation of build targets for the LLVM tools.
 
 **LLVM_BUILD_EXAMPLES**:BOOL
   Build LLVM examples. Defaults to OFF. Targets for building each example are
@@ -217,20 +226,20 @@ LLVM-specific variables
   details.
 
 **LLVM_INCLUDE_EXAMPLES**:BOOL
-  Generate build targets for the LLVM examples. Defaults to ON. You can use that
-  option for disabling the generation of build targets for the LLVM examples.
+  Generate build targets for the LLVM examples. Defaults to ON. You can use this
+  option to disable the generation of build targets for the LLVM examples.
 
 **LLVM_BUILD_TESTS**:BOOL
   Build LLVM unit tests. Defaults to OFF. Targets for building each unit test
-  are generated in any case. You can build a specific unit test with the target
-  *UnitTestNameTests* (where at this time *UnitTestName* can be ADT, Analysis,
-  ExecutionEngine, JIT, Support, Transform, VMCore; see the subdirectories of
-  *unittests* for an updated list.) It is possible to build all unit tests with
-  the target *UnitTests*.
+  are generated in any case. You can build a specific unit test using the
+  targets defined under *unittests*, such as ADTTests, IRTests, SupportTests,
+  etc. (Search for ``add_llvm_unittest`` in the subdirectories of *unittests*
+  for a complete list of unit tests.) It is possible to build all unit tests
+  with the target *UnitTests*.
 
 **LLVM_INCLUDE_TESTS**:BOOL
   Generate build targets for the LLVM unit tests. Defaults to ON. You can use
-  that option for disabling the generation of build targets for the LLVM unit
+  this option to disable the generation of build targets for the LLVM unit
   tests.
 
 **LLVM_APPEND_VC_REV**:BOOL
@@ -249,39 +258,39 @@ LLVM-specific variables
   is *Debug*.
 
 **LLVM_ENABLE_EH**:BOOL
-  Build LLVM with exception handling support. This is necessary if you wish to
+  Build LLVM with exception-handling support. This is necessary if you wish to
   link against LLVM libraries and make use of C++ exceptions in your own code
   that need to propagate through LLVM code. Defaults to OFF.
 
 **LLVM_ENABLE_PIC**:BOOL
-  Add the ``-fPIC`` flag for the compiler command-line, if the compiler supports
+  Add the ``-fPIC`` flag to the compiler command-line, if the compiler supports
   this flag. Some systems, like Windows, do not need this flag. Defaults to ON.
 
 **LLVM_ENABLE_RTTI**:BOOL
-  Build LLVM with run time type information. Defaults to OFF.
+  Build LLVM with run-time type information. Defaults to OFF.
 
 **LLVM_ENABLE_WARNINGS**:BOOL
   Enable all compiler warnings. Defaults to ON.
 
 **LLVM_ENABLE_PEDANTIC**:BOOL
-  Enable pedantic mode. This disables compiler specific extensions, if
+  Enable pedantic mode. This disables compiler-specific extensions, if
   possible. Defaults to ON.
 
 **LLVM_ENABLE_WERROR**:BOOL
-  Stop and fail build, if a compiler warning is triggered. Defaults to OFF.
+  Stop and fail the build, if a compiler warning is triggered. Defaults to OFF.
 
 **LLVM_ABI_BREAKING_CHECKS**:STRING
   Used to decide if LLVM should be built with ABI breaking checks or
   not.  Allowed values are `WITH_ASSERTS` (default), `FORCE_ON` and
   `FORCE_OFF`.  `WITH_ASSERTS` turns on ABI breaking checks in an
   assertion enabled build.  `FORCE_ON` (`FORCE_OFF`) turns them on
-  (off) irrespective of whether normal (`NDEBUG` based) assertions are
+  (off) irrespective of whether normal (`NDEBUG`-based) assertions are
   enabled or not.  A version of LLVM built with ABI breaking checks
   is not ABI compatible with a version built without it.
 
 **LLVM_BUILD_32_BITS**:BOOL
-  Build 32-bits executables and libraries on 64-bits systems. This option is
-  available only on some 64-bits unix systems. Defaults to OFF.
+  Build 32-bit executables and libraries on 64-bit systems. This option is
+  available only on some 64-bit Unix systems. Defaults to OFF.
 
 **LLVM_TARGET_ARCH**:STRING
   LLVM target to use for native code generation. This is required for JIT
@@ -290,7 +299,7 @@ LLVM-specific variables
   to the target architecture name.
 
 **LLVM_TABLEGEN**:STRING
-  Full path to a native TableGen executable (usually named ``tblgen``). This is
+  Full path to a native TableGen executable (usually named ``llvm-tblgen``). This is
   intended for cross-compiling: if the user sets this variable, no native
   TableGen will be created.
 
@@ -300,29 +309,40 @@ LLVM-specific variables
   others.
 
 **LLVM_LIT_TOOLS_DIR**:PATH
-  The path to GnuWin32 tools for tests. Valid on Windows host.  Defaults to "",
-  then Lit seeks tools according to %PATH%.  Lit can find tools(eg. grep, sort,
-  &c) on LLVM_LIT_TOOLS_DIR at first, without specifying GnuWin32 to %PATH%.
+  The path to GnuWin32 tools for tests. Valid on Windows host.  Defaults to
+  the empty string, in which case lit will look for tools needed for tests
+  (e.g. ``grep``, ``sort``, etc.) in your %PATH%. If GnuWin32 is not in your
+  %PATH%, then you can set this variable to the GnuWin32 directory so that
+  lit can find tools needed for tests in that directory.
 
 **LLVM_ENABLE_FFI**:BOOL
-  Indicates whether LLVM Interpreter will be linked with Foreign Function
-  Interface library. If the library or its headers are installed on a custom
-  location, you can set the variables FFI_INCLUDE_DIR and
-  FFI_LIBRARY_DIR. Defaults to OFF.
+  Indicates whether the LLVM Interpreter will be linked with the Foreign Function
+  Interface library (libffi) in order to enable calling external functions.
+  If the library or its headers are installed in a custom
+  location, you can also set the variables FFI_INCLUDE_DIR and
+  FFI_LIBRARY_DIR to the directories where ffi.h and libffi.so can be found,
+  respectively. Defaults to OFF.
 
 **LLVM_EXTERNAL_{CLANG,LLD,POLLY}_SOURCE_DIR**:PATH
-  Path to ``{Clang,lld,Polly}``\'s source directory. Defaults to
-  ``tools/{clang,lld,polly}``. ``{Clang,lld,Polly}`` will not be built when it
-  is empty or it does not point to a valid path.
+  These variables specify the path to the source directory for the external
+  LLVM projects Clang, lld, and Polly, respectively, relative to the top-level
+  source directory.  If the in-tree subdirectory for an external project
+  exists (e.g., llvm/tools/clang for Clang), then the corresponding variable
+  will not be used.  If the variable for an external project does not point
+  to a valid path, then that project will not be built.
 
 **LLVM_USE_OPROFILE**:BOOL
-  Enable building OProfile JIT support. Defaults to OFF
+  Enable building OProfile JIT support. Defaults to OFF.
+
+**LLVM_PROFDATA_FILE**:PATH
+  Path to a profdata file to pass into clang's -fprofile-instr-use flag. This
+  can only be specified if you're building with clang.
 
 **LLVM_USE_INTEL_JITEVENTS**:BOOL
-  Enable building support for Intel JIT Events API. Defaults to OFF
+  Enable building support for Intel JIT Events API. Defaults to OFF.
 
 **LLVM_ENABLE_ZLIB**:BOOL
-  Build with zlib to support compression/uncompression in LLVM tools.
+  Enable building with zlib to support compression/uncompression in LLVM tools.
   Defaults to ON.
 
 **LLVM_USE_SANITIZER**:STRING
@@ -361,14 +381,14 @@ LLVM-specific variables
   ``org.llvm.qch``.
   This option is only useful in combination with
   ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``;
-  otherwise this has no effect.
+  otherwise it has no effect.
 
 **LLVM_DOXYGEN_QHP_NAMESPACE**:STRING
   Namespace under which the intermediate Qt Help Project file lives. See `Qt
   Help Project`_
   for more information. Defaults to "org.llvm". This option is only useful in
   combination with ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``; otherwise
-  this has no effect.
+  it has no effect.
 
 **LLVM_DOXYGEN_QHP_CUST_FILTER_NAME**:STRING
   See `Qt Help Project`_ for
@@ -377,14 +397,14 @@ LLVM-specific variables
   be used in Qt Creator to select only documentation from LLVM when browsing
   through all the help files that you might have loaded. This option is only
   useful in combination with ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``;
-  otherwise this has no effect.
+  otherwise it has no effect.
 
 .. _Qt Help Project: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom-filters
 
 **LLVM_DOXYGEN_QHELPGENERATOR_PATH**:STRING
   The path to the ``qhelpgenerator`` executable. Defaults to whatever CMake's
   ``find_program()`` can find. This option is only useful in combination with
-  ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``; otherwise this has no
+  ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``; otherwise it has no
   effect.
 
 **LLVM_DOXYGEN_SVG**:BOOL
@@ -416,18 +436,24 @@ LLVM-specific variables
   If enabled then sphinx documentation warnings will be treated as
   errors. Defaults to ON.
 
+**LLVM_CREATE_XCODE_TOOLCHAIN**:BOOL
+  OS X Only: If enabled CMake will generate a target named
+  'install-xcode-toolchain'. This target will create a directory at
+  $CMAKE_INSTALL_PREFIX/Toolchains containing an xctoolchain directory which can
+  be used to override the default system tools. 
+
 Executing the test suite
 ========================
 
-Testing is performed when the *check* target is built. For instance, if you are
-using makefiles, execute this command while on the top level of your build
-directory:
+Testing is performed when the *check-all* target is built. For instance, if you are
+using Makefiles, execute this command in the root of your build directory:
 
 .. code-block:: console
 
-  $ make check
+  $ make check-all
 
-On Visual Studio, you may run tests to build the project "check".
+On Visual Studio, you may run tests by building the project "check-all".
+For more information about testing, see the :doc:`TestingGuide`.
 
 Cross compiling
 ===============
@@ -447,10 +473,10 @@ Embedding LLVM in your project
 
 From LLVM 3.5 onwards both the CMake and autoconf/Makefile build systems export
 LLVM libraries as importable CMake targets. This means that clients of LLVM can
-now reliably use CMake to develop their own LLVM based projects against an
+now reliably use CMake to develop their own LLVM-based projects against an
 installed version of LLVM regardless of how it was built.
 
-Here is a simple example of CMakeLists.txt file that imports the LLVM libraries
+Here is a simple example of a CMakeLists.txt file that imports the LLVM libraries
 and uses them to build a simple application ``simple-tool``.
 
 .. code-block:: cmake
@@ -495,8 +521,8 @@ This file is available in two different locations.
   On Linux typically this is ``/usr/share/llvm/cmake/LLVMConfig.cmake``.
 
 * ``<LLVM_BUILD_ROOT>/share/llvm/cmake/LLVMConfig.cmake`` where
-  ``<LLVM_BUILD_ROOT>`` is the root of the LLVM build tree. **Note this only
-  available when building LLVM with CMake**
+  ``<LLVM_BUILD_ROOT>`` is the root of the LLVM build tree. **Note: this is only
+  available when building LLVM with CMake.**
 
 If LLVM is installed in your operating system's normal installation prefix (e.g.
 on Linux this is usually ``/usr/``) ``find_package(LLVM ...)`` will
@@ -529,7 +555,7 @@ include
   A list of include paths to directories containing LLVM header files.
 
 ``LLVM_PACKAGE_VERSION``
-  The LLVM version. This string can be used with CMake conditionals. E.g. ``if
+  The LLVM version. This string can be used with CMake conditionals, e.g., ``if
   (${LLVM_PACKAGE_VERSION} VERSION_LESS "3.5")``.
 
 ``LLVM_TOOLS_BINARY_DIR``
@@ -582,7 +608,7 @@ Contents of ``<project dir>/<pass name>/CMakeLists.txt``:
 
 Note if you intend for this pass to be merged into the LLVM source tree at some
 point in the future it might make more sense to use LLVM's internal
-add_llvm_loadable_module function instead by...
+``add_llvm_loadable_module`` function instead by...
 
 
 Adding the following to ``<project dir>/CMakeLists.txt`` (after
@@ -602,7 +628,7 @@ And then changing ``<project dir>/<pass name>/CMakeLists.txt`` to
     )
 
 When you are done developing your pass, you may wish to integrate it
-into LLVM source tree. You can achieve it in two easy steps:
+into the LLVM source tree. You can achieve it in two easy steps:
 
 #. Copying ``<pass name>`` folder into ``<LLVM root>/lib/Transform`` directory.
 
@@ -618,6 +644,6 @@ Microsoft Visual C++
 --------------------
 
 **LLVM_COMPILER_JOBS**:STRING
-  Specifies the maximum number of parallell compiler jobs to use per project
+  Specifies the maximum number of parallel compiler jobs to use per project
   when building with msbuild or Visual Studio. Only supported for the Visual
   Studio 2010 CMake generator. 0 means use all processors. Default is 0.
diff --git a/docs/CMakeLists.txt b/docs/CMakeLists.txt
index 2388a92d39ef..eaa175062b61 100644
--- a/docs/CMakeLists.txt
+++ b/docs/CMakeLists.txt
@@ -147,7 +147,9 @@ if( NOT uses_ocaml LESS 0 )
     COMMAND ${CMAKE_COMMAND} -E remove_directory ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
     COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
     COMMAND ${OCAMLFIND} ocamldoc -d ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
-                                  -sort -colorize-code -html ${odoc_files})
+                                  -sort -colorize-code -html ${odoc_files}
+    COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_SOURCE_DIR}/_ocamldoc/style.css
+                                     ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html)
 
   add_dependencies(ocaml_doc ${doc_targets})
 
diff --git a/docs/CodeGenerator.rst b/docs/CodeGenerator.rst
index 03f5cbd726d8..f3b949c7ad15 100644
--- a/docs/CodeGenerator.rst
+++ b/docs/CodeGenerator.rst
@@ -640,7 +640,7 @@ For target specific directives, the MCStreamer has a MCTargetStreamer instance.
 Each target that needs it defines a class that inherits from it and is a lot
 like MCStreamer itself: It has one method per directive and two classes that
 inherit from it, a target object streamer and a target asm streamer. The target
-asm streamer just prints it (``emitFnStart -> .fnstrart``), and the object
+asm streamer just prints it (``emitFnStart -> .fnstart``), and the object
 streamer implement the assembler logic for it.
 
 To make llvm use these classes, the target initialization must call
diff --git a/docs/CodingStandards.rst b/docs/CodingStandards.rst
index de4f73c546b5..91faadffea62 100644
--- a/docs/CodingStandards.rst
+++ b/docs/CodingStandards.rst
@@ -39,7 +39,7 @@ hand, it is reasonable to rename the methods of a class if you're about to
 change it in some other way.  Just do the reformating as a separate commit from
 the functionality change.
   
-The ultimate goal of these guidelines is the increase readability and
+The ultimate goal of these guidelines is to increase the readability and
 maintainability of our common source base. If you have suggestions for topics to
 be included, please mail them to `Chris <mailto:sabre@nondot.org>`_.
 
@@ -178,8 +178,6 @@ being aware of:
 * While most of the atomics library is well implemented, the fences are
   missing. Fortunately, they are rarely needed.
 * The locale support is incomplete.
-* ``std::equal()`` (and other algorithms) incorrectly assert in MSVC when given
-  ``nullptr`` as an iterator.
 
 Other than these areas you should assume the standard library is available and
 working as expected until some build bot tells you otherwise. If you're in an
diff --git a/docs/CommandGuide/index.rst b/docs/CommandGuide/index.rst
index ed18cd048aa5..46db57f1c845 100644
--- a/docs/CommandGuide/index.rst
+++ b/docs/CommandGuide/index.rst
@@ -21,6 +21,7 @@ Basic Commands
    lli
    llvm-link
    llvm-ar
+   llvm-lib
    llvm-nm
    llvm-config
    llvm-diff
diff --git a/docs/CommandGuide/lit.rst b/docs/CommandGuide/lit.rst
index e820eef2faff..0ec14bb2236e 100644
--- a/docs/CommandGuide/lit.rst
+++ b/docs/CommandGuide/lit.rst
@@ -80,6 +80,11 @@ OUTPUT OPTIONS
  Show more information on test failures, for example the entire test output
  instead of just the test result.
 
+.. option:: -a, --show-all
+
+ Show more information about all tests, for example the entire test
+ commandline and output.
+
 .. option:: --no-progress-bar
 
  Do not use curses based progress bar.
diff --git a/docs/CommandGuide/llc.rst b/docs/CommandGuide/llc.rst
index 8d5c9ce8f8a1..5094259f9f95 100644
--- a/docs/CommandGuide/llc.rst
+++ b/docs/CommandGuide/llc.rst
@@ -127,6 +127,12 @@ End-user Options
  implements an LLVM target.  This will permit the target name to be used with
  the :option:`-march` option so that code can be generated for that target.
 
+.. option:: -meabi=[default|gnu|4|5]
+
+ Specify which EABI version should conform to.  Valid EABI versions are *gnu*,
+ *4* and *5*.  Default value (*default*) depends on the triple.
+
+
 Tuning/Configuration Options
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/docs/CommandGuide/lli.rst b/docs/CommandGuide/lli.rst
index 502fbd609353..9da13ee47e0e 100644
--- a/docs/CommandGuide/lli.rst
+++ b/docs/CommandGuide/lli.rst
@@ -1,172 +1,127 @@
 lli - directly execute programs from LLVM bitcode
 =================================================
 
-
 SYNOPSIS
 --------
 
-
-**lli** [*options*] [*filename*] [*program args*]
-
+:program:`lli` [*options*] [*filename*] [*program args*]
 
 DESCRIPTION
 -----------
 
+:program:`lli` directly executes programs in LLVM bitcode format.  It takes a program
+in LLVM bitcode format and executes it using a just-in-time compiler or an
+interpreter.
 
-**lli** directly executes programs in LLVM bitcode format.  It takes a program
-in LLVM bitcode format and executes it using a just-in-time compiler, if one is
-available for the current architecture, or an interpreter.  **lli** takes all of
-the same code generator options as llc|llc, but they are only effective when
-**lli** is using the just-in-time compiler.
+:program:`lli` is *not* an emulator. It will not execute IR of different architectures
+and it can only interpret (or JIT-compile) for the host architecture.
 
-If *filename* is not specified, then **lli** reads the LLVM bitcode for the
+The JIT compiler takes the same arguments as other tools, like :program:`llc`,
+but they don't necessarily work for the interpreter.
+
+If `filename` is not specified, then :program:`lli` reads the LLVM bitcode for the
 program from standard input.
 
 The optional *args* specified on the command line are passed to the program as
 arguments.
 
-
 GENERAL OPTIONS
 ---------------
 
-
-
-**-fake-argv0**\ =\ *executable*
+.. option:: -fake-argv0=executable
 
  Override the ``argv[0]`` value passed into the executing program.
 
-
-
-**-force-interpreter**\ =\ *{false,true}*
+.. option:: -force-interpreter={false,true}
 
  If set to true, use the interpreter even if a just-in-time compiler is available
  for this architecture. Defaults to false.
 
-
-
-**-help**
+.. option:: -help
 
  Print a summary of command line options.
 
+.. option:: -load=pluginfilename
 
-
-**-load**\ =\ *pluginfilename*
-
- Causes **lli** to load the plugin (shared object) named *pluginfilename* and use
+ Causes :program:`lli` to load the plugin (shared object) named *pluginfilename* and use
  it for optimization.
 
-
-
-**-stats**
+.. option:: -stats
 
  Print statistics from the code-generation passes. This is only meaningful for
  the just-in-time compiler, at present.
 
-
-
-**-time-passes**
+.. option:: -time-passes
 
  Record the amount of time needed for each code-generation pass and print it to
  standard error.
 
+.. option:: -version
 
-
-**-version**
-
- Print out the version of **lli** and exit without doing anything else.
-
-
-
+ Print out the version of :program:`lli` and exit without doing anything else.
 
 TARGET OPTIONS
 --------------
 
-
-
-**-mtriple**\ =\ *target triple*
+.. option:: -mtriple=target triple
 
  Override the target triple specified in the input bitcode file with the
  specified string.  This may result in a crash if you pick an
  architecture which is not compatible with the current system.
 
-
-
-**-march**\ =\ *arch*
+.. option:: -march=arch
 
  Specify the architecture for which to generate assembly, overriding the target
  encoded in the bitcode file.  See the output of **llc -help** for a list of
  valid architectures.  By default this is inferred from the target triple or
  autodetected to the current architecture.
 
-
-
-**-mcpu**\ =\ *cpuname*
+.. option:: -mcpu=cpuname
 
  Specify a specific chip in the current architecture to generate code for.
  By default this is inferred from the target triple and autodetected to
  the current architecture.  For a list of available CPUs, use:
  **llvm-as < /dev/null | llc -march=xyz -mcpu=help**
 
-
-
-**-mattr**\ =\ *a1,+a2,-a3,...*
+.. option:: -mattr=a1,+a2,-a3,...
 
  Override or control specific attributes of the target, such as whether SIMD
  operations are enabled or not.  The default set of attributes is set by the
  current CPU.  For a list of available attributes, use:
  **llvm-as < /dev/null | llc -march=xyz -mattr=help**
 
-
-
-
 FLOATING POINT OPTIONS
 ----------------------
 
-
-
-**-disable-excess-fp-precision**
+.. option:: -disable-excess-fp-precision
 
  Disable optimizations that may increase floating point precision.
 
-
-
-**-enable-no-infs-fp-math**
+.. option:: -enable-no-infs-fp-math
 
  Enable optimizations that assume no Inf values.
 
-
-
-**-enable-no-nans-fp-math**
+.. option:: -enable-no-nans-fp-math
 
  Enable optimizations that assume no NAN values.
 
+.. option:: -enable-unsafe-fp-math
 
-
-**-enable-unsafe-fp-math**
-
- Causes **lli** to enable optimizations that may decrease floating point
+ Causes :program:`lli` to enable optimizations that may decrease floating point
  precision.
 
+.. option:: -soft-float
 
-
-**-soft-float**
-
- Causes **lli** to generate software floating point library calls instead of
+ Causes :program:`lli` to generate software floating point library calls instead of
  equivalent hardware instructions.
 
-
-
-
 CODE GENERATION OPTIONS
 -----------------------
 
-
-
-**-code-model**\ =\ *model*
+.. option:: -code-model=model
 
  Choose the code model from:
 
-
  .. code-block:: perl
 
       default: Target default code model
@@ -175,42 +130,30 @@ CODE GENERATION OPTIONS
       medium: Medium code model
       large: Large code model
 
-
-
-
-**-disable-post-RA-scheduler**
+.. option:: -disable-post-RA-scheduler
 
  Disable scheduling after register allocation.
 
-
-
-**-disable-spill-fusing**
+.. option:: -disable-spill-fusing
 
  Disable fusing of spill code into instructions.
 
-
-
-**-jit-enable-eh**
+.. option:: -jit-enable-eh
 
  Exception handling should be enabled in the just-in-time compiler.
 
-
-
-**-join-liveintervals**
+.. option:: -join-liveintervals
 
  Coalesce copies (default=true).
 
+.. option:: -nozero-initialized-in-bss
 
+  Don't place zero-initialized symbols into the BSS section.
 
-**-nozero-initialized-in-bss** Don't place zero-initialized symbols into the BSS section.
-
-
-
-**-pre-RA-sched**\ =\ *scheduler*
+.. option:: -pre-RA-sched=scheduler
 
  Instruction schedulers available (before register allocation):
 
-
  .. code-block:: perl
 
       =default: Best scheduler for the target
@@ -221,74 +164,51 @@ CODE GENERATION OPTIONS
       =list-tdrr: Top-down register reduction list scheduling
       =list-td: Top-down list scheduler -print-machineinstrs - Print generated machine code
 
-
-
-
-**-regalloc**\ =\ *allocator*
+.. option:: -regalloc=allocator
 
  Register allocator to use (default=linearscan)
 
-
  .. code-block:: perl
 
       =bigblock: Big-block register allocator
       =linearscan: linear scan register allocator =local -   local register allocator
       =simple: simple register allocator
 
-
-
-
-**-relocation-model**\ =\ *model*
+.. option:: -relocation-model=model
 
  Choose relocation model from:
 
-
  .. code-block:: perl
 
       =default: Target default relocation model
       =static: Non-relocatable code =pic -   Fully relocatable, position independent code
       =dynamic-no-pic: Relocatable external references, non-relocatable code
 
-
-
-
-**-spiller**
+.. option:: -spiller
 
  Spiller to use (default=local)
 
-
  .. code-block:: perl
 
       =simple: simple spiller
       =local: local spiller
 
-
-
-
-**-x86-asm-syntax**\ =\ *syntax*
+.. option:: -x86-asm-syntax=syntax
 
  Choose style of code to emit from X86 backend:
 
-
  .. code-block:: perl
 
       =att: Emit AT&T-style assembly
       =intel: Emit Intel-style assembly
 
-
-
-
-
 EXIT STATUS
 -----------
 
-
-If **lli** fails to load the program, it will exit with an exit code of 1.
+If :program:`lli` fails to load the program, it will exit with an exit code of 1.
 Otherwise, it will return the exit code of the program it executes.
 
-
 SEE ALSO
 --------
 
-
-llc|llc
+:program:`llc`
diff --git a/docs/CommandGuide/llvm-lib.rst b/docs/CommandGuide/llvm-lib.rst
new file mode 100644
index 000000000000..ecd0a7db7e37
--- /dev/null
+++ b/docs/CommandGuide/llvm-lib.rst
@@ -0,0 +1,31 @@
+llvm-lib - LLVM lib.exe compatible library tool
+===============================================
+
+
+SYNOPSIS
+--------
+
+
+**llvm-lib** [/libpath:<path>] [/out:<output>] [/llvmlibthin]
+[/ignore] [/machine] [/nologo] [files...]
+
+
+DESCRIPTION
+-----------
+
+
+The **llvm-lib** command is intended to be a ``lib.exe`` compatible
+tool. See https://msdn.microsoft.com/en-us/library/7ykb2k5f for the
+general description.
+
+**llvm-lib** has the following extensions:
+
+* Bitcode files in symbol tables.
+  **llvm-lib** includes symbols from both bitcode files and regular
+  object files in the symbol table.
+
+* Creating thin archives.
+  The /llvmlibthin option causes **llvm-lib** to create thin archive
+  that contain only the symbol table and the header for the various
+  members. These files are much smaller, but are not compatible with
+  link.exe (lld can handle them).
diff --git a/docs/CommandGuide/llvm-profdata.rst b/docs/CommandGuide/llvm-profdata.rst
index 7053b7fa710e..74fe4ee9d219 100644
--- a/docs/CommandGuide/llvm-profdata.rst
+++ b/docs/CommandGuide/llvm-profdata.rst
@@ -28,7 +28,7 @@ MERGE
 SYNOPSIS
 ^^^^^^^^
 
-:program:`llvm-profdata merge` [*options*] [*filenames...*]
+:program:`llvm-profdata merge` [*options*] [*filename...*]
 
 DESCRIPTION
 ^^^^^^^^^^^
@@ -37,6 +37,14 @@ DESCRIPTION
 generated by PGO instrumentation and merges them together into a single
 indexed profile data file.
 
+By default profile data is merged without modification. This means that the
+relative importance of each input file is proportional to the number of samples
+or counts it contains. In general, the input from a longer training run will be
+interpreted as relatively more important than a shorter run. Depending on the
+nature of the training runs it may be useful to adjust the weight given to each
+input file by using the ``-weighted-input`` option.
+
+
 OPTIONS
 ^^^^^^^
 
@@ -49,28 +57,63 @@ OPTIONS
  Specify the output file name.  *Output* cannot be ``-`` as the resulting
  indexed profile data can't be written to standard output.
 
+.. option:: -weighted-input=weight,filename
+
+ Specify an input file name along with a weight. The profile counts of the input
+ file will be scaled (multiplied) by the supplied ``weight``, where where ``weight``
+ is a decimal integer >= 1. Input files specified without using this option are
+ assigned a default weight of 1. Examples are shown below.
+
 .. option:: -instr (default)
 
  Specify that the input profile is an instrumentation-based profile.
 
 .. option:: -sample
 
- Specify that the input profile is a sample-based profile. When using
- sample-based profiles, the format of the generated file can be generated
- in one of three ways:
+ Specify that the input profile is a sample-based profile.
+ 
+ The format of the generated file can be generated in one of three ways:
 
  .. option:: -binary (default)
 
- Emit the profile using a binary encoding.
+ Emit the profile using a binary encoding. For instrumentation-based profile
+ the output format is the indexed binary format. 
 
  .. option:: -text
 
- Emit the profile in text mode.
+ Emit the profile in text mode. This option can also be used with both
+ sample-based and instrumentation-based profile. When this option is used
+ the profile will be dumped in the text format that is parsable by the profile
+ reader.
 
  .. option:: -gcc
 
  Emit the profile using GCC's gcov format (Not yet supported).
 
+EXAMPLES
+^^^^^^^^
+Basic Usage
++++++++++++
+Merge three profiles:
+
+::
+
+    llvm-profdata merge foo.profdata bar.profdata baz.profdata -output merged.profdata
+
+Weighted Input
+++++++++++++++
+The input file `foo.profdata` is especially important, multiply its counts by 10:
+
+::
+
+    llvm-profdata merge -weighted-input=10,foo.profdata bar.profdata baz.profdata -output merged.profdata
+
+Exactly equivalent to the previous invocation (explicit form; useful for programmatic invocation):
+
+::
+
+    llvm-profdata merge -weighted-input=10,foo.profdata -weighted-input=1,bar.profdata -weighted-input=1,baz.profdata -output merged.profdata
+
 .. program:: llvm-profdata show
 
 .. _profdata-show:
@@ -121,6 +164,13 @@ OPTIONS
 
  Specify that the input profile is an instrumentation-based profile.
 
+.. option:: -text
+
+ Instruct the profile dumper to show profile counts in the text format of the
+ instrumentation-based profile data representation. By default, the profile
+ information is dumped in a more human readable form (also in text) with
+ annotations.
+
 .. option:: -sample
 
  Specify that the input profile is a sample-based profile.
diff --git a/docs/CommandGuide/llvm-symbolizer.rst b/docs/CommandGuide/llvm-symbolizer.rst
index 96720e633f2f..ec4178e4e7ab 100644
--- a/docs/CommandGuide/llvm-symbolizer.rst
+++ b/docs/CommandGuide/llvm-symbolizer.rst
@@ -56,6 +56,14 @@ EXAMPLE
 
   foo(int)
   /tmp/a.cc:12
+  $cat addr.txt
+  0x40054d
+  $llvm-symbolizer -inlining -print-address -pretty-print -obj=addr.exe < addr.txt
+  0x40054d: inc at /tmp/x.c:3:3
+   (inlined by) main at /tmp/x.c:9:0
+  $llvm-symbolizer -inlining -pretty-print -obj=addr.exe < addr.txt
+  inc at /tmp/x.c:3:3
+   (inlined by) main at /tmp/x.c:9:0
 
 OPTIONS
 -------
@@ -98,6 +106,14 @@ OPTIONS
  location, look for the debug info at the .dSYM path provided via the
  ``-dsym-hint`` flag. This flag can be used multiple times.
 
+.. option:: -print-address
+
+ Print address before the source code location. Defaults to false.
+
+.. option:: -pretty-print
+
+ Print human readable output. If ``-inlining`` is specified, enclosing scope is
+ prefixed by (inlined by). Refer to listed examples.
 
 EXIT STATUS
 -----------
diff --git a/docs/CommandLine.rst b/docs/CommandLine.rst
index 1d85215f2af3..556c302501e2 100644
--- a/docs/CommandLine.rst
+++ b/docs/CommandLine.rst
@@ -1737,6 +1737,7 @@ exported by the ``lib/VMCore/PassManager.cpp`` file.
 .. _dynamically loaded options:
 
 Dynamically adding command line options
+---------------------------------------
 
 .. todo::
 
diff --git a/docs/CompileCudaWithLLVM.rst b/docs/CompileCudaWithLLVM.rst
new file mode 100644
index 000000000000..a981ffe1e8f5
--- /dev/null
+++ b/docs/CompileCudaWithLLVM.rst
@@ -0,0 +1,169 @@
+===================================
+Compiling CUDA C/C++ with LLVM
+===================================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+This document contains the user guides and the internals of compiling CUDA
+C/C++ with LLVM. It is aimed at both users who want to compile CUDA with LLVM
+and developers who want to improve LLVM for GPUs. This document assumes a basic
+familiarity with CUDA. Information about CUDA programming can be found in the
+`CUDA programming guide
+<http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html>`_.
+
+How to Build LLVM with CUDA Support
+===================================
+
+Below is a quick summary of downloading and building LLVM. Consult the `Getting
+Started <http://llvm.org/docs/GettingStarted.html>`_ page for more details on
+setting up LLVM.
+
+#. Checkout LLVM
+
+   .. code-block:: console
+
+     $ cd where-you-want-llvm-to-live
+     $ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
+
+#. Checkout Clang
+
+   .. code-block:: console
+
+     $ cd where-you-want-llvm-to-live
+     $ cd llvm/tools
+     $ svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
+
+#. Configure and build LLVM and Clang
+
+   .. code-block:: console
+
+     $ cd where-you-want-llvm-to-live
+     $ mkdir build
+     $ cd build
+     $ cmake [options] ..
+     $ make
+
+How to Compile CUDA C/C++ with LLVM
+===================================
+
+We assume you have installed the CUDA driver and runtime. Consult the `NVIDIA
+CUDA installation Guide
+<https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ if
+you have not.
+
+Suppose you want to compile and run the following CUDA program (``axpy.cu``)
+which multiplies a ``float`` array by a ``float`` scalar (AXPY).
+
+.. code-block:: c++
+
+  #include <helper_cuda.h> // for checkCudaErrors
+
+  #include <iostream>
+
+  __global__ void axpy(float a, float* x, float* y) {
+    y[threadIdx.x] = a * x[threadIdx.x];
+  }
+
+  int main(int argc, char* argv[]) {
+    const int kDataLen = 4;
+
+    float a = 2.0f;
+    float host_x[kDataLen] = {1.0f, 2.0f, 3.0f, 4.0f};
+    float host_y[kDataLen];
+
+    // Copy input data to device.
+    float* device_x;
+    float* device_y;
+    checkCudaErrors(cudaMalloc(&device_x, kDataLen * sizeof(float)));
+    checkCudaErrors(cudaMalloc(&device_y, kDataLen * sizeof(float)));
+    checkCudaErrors(cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),
+                               cudaMemcpyHostToDevice));
+
+    // Launch the kernel.
+    axpy<<<1, kDataLen>>>(a, device_x, device_y);
+
+    // Copy output data to host.
+    checkCudaErrors(cudaDeviceSynchronize());
+    checkCudaErrors(cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),
+                               cudaMemcpyDeviceToHost));
+
+    // Print the results.
+    for (int i = 0; i < kDataLen; ++i) {
+      std::cout << "y[" << i << "] = " << host_y[i] << "\n";
+    }
+
+    checkCudaErrors(cudaDeviceReset());
+    return 0;
+  }
+
+The command line for compilation is similar to what you would use for C++.
+
+.. code-block:: console
+
+  $ clang++ -o axpy -I<CUDA install path>/samples/common/inc -L<CUDA install path>/<lib64 or lib> axpy.cu -lcudart_static -lcuda -ldl -lrt -pthread
+  $ ./axpy
+  y[0] = 2
+  y[1] = 4
+  y[2] = 6
+  y[3] = 8
+
+Note that ``helper_cuda.h`` comes from the CUDA samples, so you need the
+samples installed for this example. ``<CUDA install path>`` is the root
+directory where you installed CUDA SDK, typically ``/usr/local/cuda``.
+
+Optimizations
+=============
+
+CPU and GPU have different design philosophies and architectures. For example, a
+typical CPU has branch prediction, out-of-order execution, and is superscalar,
+whereas a typical GPU has none of these. Due to such differences, an
+optimization pipeline well-tuned for CPUs may be not suitable for GPUs.
+
+LLVM performs several general and CUDA-specific optimizations for GPUs. The
+list below shows some of the more important optimizations for GPUs. Most of
+them have been upstreamed to ``lib/Transforms/Scalar`` and
+``lib/Target/NVPTX``. A few of them have not been upstreamed due to lack of a
+customizable target-independent optimization pipeline.
+
+* **Straight-line scalar optimizations**. These optimizations reduce redundancy
+  in straight-line code. Details can be found in the `design document for
+  straight-line scalar optimizations <https://goo.gl/4Rb9As>`_.
+
+* **Inferring memory spaces**. `This optimization
+  <http://www.llvm.org/docs/doxygen/html/NVPTXFavorNonGenericAddrSpaces_8cpp_source.html>`_
+  infers the memory space of an address so that the backend can emit faster
+  special loads and stores from it. Details can be found in the `design
+  document for memory space inference <https://goo.gl/5wH2Ct>`_.
+
+* **Aggressive loop unrooling and function inlining**. Loop unrolling and
+  function inlining need to be more aggressive for GPUs than for CPUs because
+  control flow transfer in GPU is more expensive. They also promote other
+  optimizations such as constant propagation and SROA which sometimes speed up
+  code by over 10x. An empirical inline threshold for GPUs is 1100. This
+  configuration has yet to be upstreamed with a target-specific optimization
+  pipeline. LLVM also provides `loop unrolling pragmas
+  <http://clang.llvm.org/docs/AttributeReference.html#pragma-unroll-pragma-nounroll>`_
+  and ``__attribute__((always_inline))`` for programmers to force unrolling and
+  inling.
+
+* **Aggressive speculative execution**. `This transformation
+  <http://llvm.org/docs/doxygen/html/SpeculativeExecution_8cpp_source.html>`_ is
+  mainly for promoting straight-line scalar optimizations which are most
+  effective on code along dominator paths.
+
+* **Memory-space alias analysis**. `This alias analysis
+  <http://reviews.llvm.org/D12414>`_ infers that two pointers in different
+  special memory spaces do not alias. It has yet to be integrated to the new
+  alias analysis infrastructure; the new infrastructure does not run
+  target-specific alias analysis.
+
+* **Bypassing 64-bit divides**. `An existing optimization
+  <http://llvm.org/docs/doxygen/html/BypassSlowDivision_8cpp_source.html>`_
+  enabled in the NVPTX backend. 64-bit integer divides are much slower than
+  32-bit ones on NVIDIA GPUs due to lack of a divide unit. Many of the 64-bit
+  divides in our benchmarks have a divisor and dividend which fit in 32-bits at
+  runtime. This optimization provides a fast path for this common case.
diff --git a/docs/CompilerWriterInfo.rst b/docs/CompilerWriterInfo.rst
index 900ba24e230f..6c3ff4b10f1e 100644
--- a/docs/CompilerWriterInfo.rst
+++ b/docs/CompilerWriterInfo.rst
@@ -22,14 +22,16 @@ ARM
 
 * `ABI Addenda and Errata <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0045d/IHI0045D_ABI_addenda.pdf>`_
 
-* `ARM C Language Extensions <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053a/IHI0053A_acle.pdf>`_
+* `ARM C Language Extensions <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf>`_
 
 AArch64
 -------
 
+* `ARMv8 Architecture Reference Manual <http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.h/index.html>`_
+
 * `ARMv8 Instruction Set Overview <http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.genc010197a/index.html>`_
 
-* `ARM C Language Extensions <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053a/IHI0053A_acle.pdf>`_
+* `ARM C Language Extensions <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf>`_
 
 Itanium (ia64)
 --------------
diff --git a/docs/CoverageMappingFormat.rst b/docs/CoverageMappingFormat.rst
index 8fcffb838a3f..9ac476c88b34 100644
--- a/docs/CoverageMappingFormat.rst
+++ b/docs/CoverageMappingFormat.rst
@@ -405,7 +405,7 @@ LEB128 is an unsigned interger value that is encoded using DWARF's LEB128
 encoding, optimizing for the case where values are small
 (1 byte for values less than 128).
 
-.. _strings:
+.. _Strings:
 
 Strings
 ^^^^^^^
diff --git a/docs/DeveloperPolicy.rst b/docs/DeveloperPolicy.rst
index 9e458559fbcd..17baf2d27b13 100644
--- a/docs/DeveloperPolicy.rst
+++ b/docs/DeveloperPolicy.rst
@@ -505,8 +505,15 @@ for llvm users and not imposing a big burden on llvm developers:
 * The textual format is not backwards compatible. We don't change it too often,
   but there are no specific promises.
 
-* The bitcode format produced by a X.Y release will be readable by all following
-  X.Z releases and the (X+1).0 release.
+* Additions and changes to the IR should be reflected in
+  ``test/Bitcode/compatibility.ll``.
+
+* The bitcode format produced by a X.Y release will be readable by all
+  following X.Z releases and the (X+1).0 release.
+
+* After each X.Y release, ``compatibility.ll`` must be copied to
+  ``compatibility-X.Y.ll``. The corresponding bitcode file should be assembled
+  using the X.Y build and committed as ``compatibility-X.Y.ll.bc``.
 
 * Newer releases can ignore features from older releases, but they cannot
   miscompile them. For example, if nsw is ever replaced with something else,
@@ -518,6 +525,33 @@ for llvm users and not imposing a big burden on llvm developers:
   it is to drop it. That is not very user friendly and a bit more effort is
   expected, but no promises are made.
 
+C API Changes
+----------------
+
+* Stability Guarantees: The C API is, in general, a "best effort" for stability.
+  This means that we make every attempt to keep the C API stable, but that
+  stability will be limited by the abstractness of the interface and the
+  stability of the C++ API that it wraps. In practice, this means that things
+  like "create debug info" or "create this type of instruction" are likely to be
+  less stable than "take this IR file and JIT it for my current machine".
+
+* Release stability: We won't break the C API on the release branch with patches
+  that go on that branch, with the exception that we will fix an unintentional
+  C API break that will keep the release consistent with both the previous and
+  next release.
+
+* Testing: Patches to the C API are expected to come with tests just like any
+  other patch.
+
+* Including new things into the API: If an LLVM subcomponent has a C API already
+  included, then expanding that C API is acceptable. Adding C API for
+  subcomponents that don't currently have one needs to be discussed on the
+  mailing list for design and maintainability feedback prior to implementation.
+
+* Documentation: Any changes to the C API are required to be documented in the
+  release notes so that it's clear to external users who do not follow the
+  project how the C API is changing and evolving.
+
 .. _copyright-license-patents:
 
 Copyright, License, and Patents
@@ -624,5 +658,5 @@ patent-related trouble with their changes (including from third parties).  If
 you or your employer own the rights to a patent and would like to contribute
 code to LLVM that relies on it, we require that the copyright owner sign an
 agreement that allows any other user of LLVM to freely use your patent.  Please
-contact the `oversight group <mailto:llvm-oversight@cs.uiuc.edu>`_ for more
+contact the `LLVM Foundation Board of Directors <mailto:board@llvm.org>`_ for more
 details.
diff --git a/docs/ExceptionHandling.rst b/docs/ExceptionHandling.rst
index 55ffdb45efe9..74827c02a272 100644
--- a/docs/ExceptionHandling.rst
+++ b/docs/ExceptionHandling.rst
@@ -67,17 +67,10 @@ exception handling is generally preferred to SJLJ.
 Windows Runtime Exception Handling
 -----------------------------------
 
-Windows runtime based exception handling uses the same basic IR structure as
-Itanium ABI based exception handling, but it relies on the personality
-functions provided by the native Windows runtime library, ``__CxxFrameHandler3``
-for C++ exceptions: ``__C_specific_handler`` for 64-bit SEH or 
-``_frame_handler3/4`` for 32-bit SEH.  This results in a very different
-execution model and requires some minor modifications to the initial IR
-representation and a significant restructuring just before code generation.
-
-General information about the Windows x64 exception handling mechanism can be
-found at `MSDN Exception Handling (x64)
-<https://msdn.microsoft.com/en-us/library/1eyas8tf(v=vs.80).aspx>`_.
+LLVM supports handling exceptions produced by the Windows runtime, but it
+requires a very different intermediate representation. It is not based on the
+":ref:`landingpad <i_landingpad>`" instruction like the other two models, and is
+described later in this document under :ref:`wineh`.
 
 Overview
 --------
@@ -169,11 +162,11 @@ pad to the back end. For C++, the ``landingpad`` instruction returns a pointer
 and integer pair corresponding to the pointer to the *exception structure* and
 the *selector value* respectively.
 
-The ``landingpad`` instruction takes a reference to the personality function to
-be used for this ``try``/``catch`` sequence. The remainder of the instruction is
-a list of *cleanup*, *catch*, and *filter* clauses. The exception is tested
-against the clauses sequentially from first to last. The clauses have the
-following meanings:
+The ``landingpad`` instruction looks for a reference to the personality
+function to be used for this ``try``/``catch`` sequence in the parent
+function's attribute list. The instruction contains a list of *cleanup*,
+*catch*, and *filter* clauses. The exception is tested against the clauses
+sequentially from first to last. The clauses have the following meanings:
 
 -  ``catch <type> @ExcType``
 
@@ -321,97 +314,6 @@ the selector results they understand and then resume exception propagation with
 the `resume instruction <LangRef.html#i_resume>`_ if none of the conditions
 match.
 
-C++ Exception Handling using the Windows Runtime
-=================================================
-
-(Note: Windows C++ exception handling support is a work in progress and is
- not yet fully implemented.  The text below describes how it will work
- when completed.)
-
-The Windows runtime function for C++ exception handling uses a multi-phase
-approach.  When an exception occurs it searches the current callstack for a
-frame that has a handler for the exception.  If a handler is found, it then
-calls the cleanup handler for each frame above the handler which has a
-cleanup handler before calling the catch handler.  These calls are all made
-from a stack context different from the original frame in which the handler
-is defined.  Therefore, it is necessary to outline these handlers from their
-original context before code generation.
-
-Catch handlers are called with a pointer to the handler itself as the first
-argument and a pointer to the parent function's stack frame as the second
-argument.  The catch handler uses the `llvm.localrecover
-<LangRef.html#llvm-localescape-and-llvm-localrecover-intrinsics>`_ to get a
-pointer to a frame allocation block that is created in the parent frame using
-the `llvm.localescape
-<LangRef.html#llvm-localescape-and-llvm-localrecover-intrinsics>`_ intrinsic.
-The ``WinEHPrepare`` pass will have created a structure definition for the
-contents of this block.  The first two members of the structure will always be
-(1) a 32-bit integer that the runtime uses to track the exception state of the
-parent frame for the purposes of handling chained exceptions and (2) a pointer
-to the object associated with the exception (roughly, the parameter of the
-catch clause). These two members will be followed by any frame variables from
-the parent function which must be accessed in any of the functions unwind or
-catch handlers.  The catch handler returns the address at which execution
-should continue.
-
-Cleanup handlers perform any cleanup necessary as the frame goes out of scope,
-such as calling object destructors.  The runtime handles the actual unwinding
-of the stack.  If an exception occurs in a cleanup handler the runtime manages
-termination of the process. Cleanup handlers are called with the same arguments
-as catch handlers (a pointer to the handler and a pointer to the parent stack
-frame) and use the same mechanism described above to access frame variables
-in the parent function.  Cleanup handlers do not return a value.
-
-The IR generated for Windows runtime based C++ exception handling is initially
-very similar to the ``landingpad`` mechanism described above.  Calls to
-libc++abi functions (such as ``__cxa_begin_catch``/``__cxa_end_catch`` and
-``__cxa_throw_exception`` are replaced with calls to intrinsics or Windows
-runtime functions (such as ``llvm.eh.begincatch``/``llvm.eh.endcatch`` and
-``__CxxThrowException``).
-
-During the WinEHPrepare pass, the handler functions are outlined into handler
-functions and the original landing pad code is replaced with a call to the
-``llvm.eh.actions`` intrinsic that describes the order in which handlers will
-be processed from the logical location of the landing pad and an indirect
-branch to the return value of the ``llvm.eh.actions`` intrinsic. The
-``llvm.eh.actions`` intrinsic is defined as returning the address at which
-execution will continue.  This is a temporary construct which will be removed
-before code generation, but it allows for the accurate tracking of control
-flow until then.
-
-A typical landing pad will look like this after outlining:
-
-.. code-block:: llvm
-
-    lpad:
-      %vals = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
-	      cleanup
-          catch i8* bitcast (i8** @_ZTIi to i8*)
-          catch i8* bitcast (i8** @_ZTIf to i8*)
-      %recover = call i8* (...)* @llvm.eh.actions(
-          i32 3, i8* bitcast (i8** @_ZTIi to i8*), i8* (i8*, i8*)* @_Z4testb.catch.1)
-          i32 2, i8* null, void (i8*, i8*)* @_Z4testb.cleanup.1)
-          i32 1, i8* bitcast (i8** @_ZTIf to i8*), i8* (i8*, i8*)* @_Z4testb.catch.0)
-          i32 0, i8* null, void (i8*, i8*)* @_Z4testb.cleanup.0)
-      indirectbr i8* %recover, [label %try.cont1, label %try.cont2]
-
-In this example, the landing pad represents an exception handling context with
-two catch handlers and a cleanup handler that have been outlined.  If an
-exception is thrown with a type that matches ``_ZTIi``, the ``_Z4testb.catch.1``
-handler will be called an no clean-up is needed.  If an exception is thrown
-with a type that matches ``_ZTIf``, first the ``_Z4testb.cleanup.1`` handler
-will be called to perform unwind-related cleanup, then the ``_Z4testb.catch.1``
-handler will be called.  If an exception is throw which does not match either
-of these types and the exception is handled by another frame further up the
-call stack, first the ``_Z4testb.cleanup.1`` handler will be called, then the
-``_Z4testb.cleanup.0`` handler (which corresponds to a different scope) will be
-called, and exception handling will continue at the next frame in the call
-stack will be called.  One of the catch handlers will return the address of
-``%try.cont1`` in the parent function and the other will return the address of
-``%try.cont2``, meaning that execution continues at one of those blocks after
-an exception is caught.
-
-
 Exception Handling Intrinsics
 =============================
 
@@ -498,50 +400,19 @@ When used in the native Windows C++ exception handling implementation, this
 intrinsic serves as a placeholder to delimit code before a catch handler is
 outlined.  After the handler is outlined, this intrinsic is simply removed.
 
-.. _llvm.eh.actions:
 
-``llvm.eh.actions``
-----------------------
+.. _llvm.eh.exceptionpointer:
+
+``llvm.eh.exceptionpointer``
+----------------------------
 
 .. code-block:: llvm
 
-  void @llvm.eh.actions()
-
-This intrinsic represents the list of actions to take when an exception is
-thrown. It is typically used by Windows exception handling schemes where cleanup
-outlining is required by the runtime. The arguments are a sequence of ``i32``
-sentinels indicating the action type followed by some pre-determined number of
-arguments required to implement that action.
-
-A code of ``i32 0`` indicates a cleanup action, which expects one additional
-argument. The argument is a pointer to a function that implements the cleanup
-action.
-
-A code of ``i32 1`` indicates a catch action, which expects three additional
-arguments. Different EH schemes give different meanings to the three arguments,
-but the first argument indicates whether the catch should fire, the second is
-the localescape index of the exception object, and the third is the code to run
-to catch the exception.
-
-For Windows C++ exception handling, the first argument for a catch handler is a
-pointer to the RTTI type descriptor for the object to catch. The second
-argument is an index into the argument list of the ``llvm.localescape`` call in
-the main function. The exception object will be copied into the provided stack
-object. If the exception object is not required, this argument should be -1.
-The third argument is a pointer to a function implementing the catch.  This
-function returns the address of the basic block where execution should resume
-after handling the exception.
-
-For Windows SEH, the first argument is a pointer to the filter function, which
-indicates if the exception should be caught or not.  The second argument is
-typically negative one. The third argument is the address of a basic block
-where the exception will be handled. In other words, catch handlers are not
-outlined in SEH. After running cleanups, execution immediately resumes at this
-PC.
-
-In order to preserve the structure of the CFG, a call to '``llvm.eh.actions``'
-must be followed by an ':ref:`indirectbr <i_indirectbr>`' instruction that
-jumps to the result of the intrinsic call.
+  i8 addrspace(N)* @llvm.eh.padparam.pNi8(token %catchpad)
+
+
+This intrinsic retrieves a pointer to the exception caught by the given
+``catchpad``.
 
 
 SJLJ Intrinsics
@@ -628,10 +499,279 @@ an exception handling frame for each function in a compile unit, plus a common
 exception handling frame that defines information common to all functions in the
 unit.
 
+The format of this call frame information (CFI) is often platform-dependent,
+however. ARM, for example, defines their own format. Apple has their own compact
+unwind info format.  On Windows, another format is used for all architectures
+since 32-bit x86.  LLVM will emit whatever information is required by the
+target.
+
 Exception Tables
 ----------------
 
 An exception table contains information about what actions to take when an
-exception is thrown in a particular part of a function's code. There is one
-exception table per function, except leaf functions and functions that have
-calls only to non-throwing functions. They do not need an exception table.
+exception is thrown in a particular part of a function's code. This is typically
+referred to as the language-specific data area (LSDA). The format of the LSDA
+table is specific to the personality function, but the majority of personalities
+out there use a variation of the tables consumed by ``__gxx_personality_v0``.
+There is one exception table per function, except leaf functions and functions
+that have calls only to non-throwing functions. They do not need an exception
+table.
+
+.. _wineh:
+
+Exception Handling using the Windows Runtime
+=================================================
+
+Background on Windows exceptions
+---------------------------------
+
+Interacting with exceptions on Windows is significantly more complicated than
+on Itanium C++ ABI platforms. The fundamental difference between the two models
+is that Itanium EH is designed around the idea of "successive unwinding," while
+Windows EH is not.
+
+Under Itanium, throwing an exception typically involes allocating thread local
+memory to hold the exception, and calling into the EH runtime. The runtime
+identifies frames with appropriate exception handling actions, and successively
+resets the register context of the current thread to the most recently active
+frame with actions to run. In LLVM, execution resumes at a ``landingpad``
+instruction, which produces register values provided by the runtime. If a
+function is only cleaning up allocated resources, the function is responsible
+for calling ``_Unwind_Resume`` to transition to the next most recently active
+frame after it is finished cleaning up. Eventually, the frame responsible for
+handling the exception calls ``__cxa_end_catch`` to destroy the exception,
+release its memory, and resume normal control flow.
+
+The Windows EH model does not use these successive register context resets.
+Instead, the active exception is typically described by a frame on the stack.
+In the case of C++ exceptions, the exception object is allocated in stack memory
+and its address is passed to ``__CxxThrowException``. General purpose structured
+exceptions (SEH) are more analogous to Linux signals, and they are dispatched by
+userspace DLLs provided with Windows. Each frame on the stack has an assigned EH
+personality routine, which decides what actions to take to handle the exception.
+There are a few major personalities for C and C++ code: the C++ personality
+(``__CxxFrameHandler3``) and the SEH personalities (``_except_handler3``,
+``_except_handler4``, and ``__C_specific_handler``). All of them implement
+cleanups by calling back into a "funclet" contained in the parent function.
+
+Funclets, in this context, are regions of the parent function that can be called
+as though they were a function pointer with a very special calling convention.
+The frame pointer of the parent frame is passed into the funclet either using
+the standard EBP register or as the first parameter register, depending on the
+architecture. The funclet implements the EH action by accessing local variables
+in memory through the frame pointer, and returning some appropriate value,
+continuing the EH process.  No variables live in to or out of the funclet can be
+allocated in registers.
+
+The C++ personality also uses funclets to contain the code for catch blocks
+(i.e. all user code between the braces in ``catch (Type obj) { ... }``). The
+runtime must use funclets for catch bodies because the C++ exception object is
+allocated in a child stack frame of the function handling the exception. If the
+runtime rewound the stack back to frame of the catch, the memory holding the
+exception would be overwritten quickly by subsequent function calls.  The use of
+funclets also allows ``__CxxFrameHandler3`` to implement rethrow without
+resorting to TLS. Instead, the runtime throws a special exception, and then uses
+SEH (``__try / __except``) to resume execution with new information in the child
+frame.
+
+In other words, the successive unwinding approach is incompatible with Visual
+C++ exceptions and general purpose Windows exception handling. Because the C++
+exception object lives in stack memory, LLVM cannot provide a custom personality
+function that uses landingpads.  Similarly, SEH does not provide any mechanism
+to rethrow an exception or continue unwinding.  Therefore, LLVM must use the IR
+constructs described later in this document to implement compatible exception
+handling.
+
+SEH filter expressions
+-----------------------
+
+The SEH personality functions also use funclets to implement filter expressions,
+which allow executing arbitrary user code to decide which exceptions to catch.
+Filter expressions should not be confused with the ``filter`` clause of the LLVM
+``landingpad`` instruction.  Typically filter expressions are used to determine
+if the exception came from a particular DLL or code region, or if code faulted
+while accessing a particular memory address range. LLVM does not currently have
+IR to represent filter expressions because it is difficult to represent their
+control dependencies.  Filter expressions run during the first phase of EH,
+before cleanups run, making it very difficult to build a faithful control flow
+graph.  For now, the new EH instructions cannot represent SEH filter
+expressions, and frontends must outline them ahead of time. Local variables of
+the parent function can be escaped and accessed using the ``llvm.localescape``
+and ``llvm.localrecover`` intrinsics.
+
+New exception handling instructions
+------------------------------------
+
+The primary design goal of the new EH instructions is to support funclet
+generation while preserving information about the CFG so that SSA formation
+still works.  As a secondary goal, they are designed to be generic across MSVC
+and Itanium C++ exceptions. They make very few assumptions about the data
+required by the personality, so long as it uses the familiar core EH actions:
+catch, cleanup, and terminate.  However, the new instructions are hard to modify
+without knowing details of the EH personality. While they can be used to
+represent Itanium EH, the landingpad model is strictly better for optimization
+purposes.
+
+The following new instructions are considered "exception handling pads", in that
+they must be the first non-phi instruction of a basic block that may be the
+unwind destination of an EH flow edge:
+``catchswitch``, ``catchpad``, and ``cleanuppad``.
+As with landingpads, when entering a try scope, if the
+frontend encounters a call site that may throw an exception, it should emit an
+invoke that unwinds to a ``catchswitch`` block. Similarly, inside the scope of a
+C++ object with a destructor, invokes should unwind to a ``cleanuppad``.
+
+New instructions are also used to mark the points where control is transferred
+out of a catch/cleanup handler (which will correspond to exits from the
+generated funclet).  A catch handler which reaches its end by normal execution
+executes a ``catchret`` instruction, which is a terminator indicating where in
+the function control is returned to.  A cleanup handler which reaches its end
+by normal execution executes a ``cleanupret`` instruction, which is a terminator
+indicating where the active exception will unwind to next.
+
+Each of these new EH pad instructions has a way to identify which action should
+be considered after this action. The ``catchswitch`` instruction is a terminator
+and has an unwind destination operand analogous to the unwind destination of an
+invoke.  The ``cleanuppad`` instruction is not
+a terminator, so the unwind destination is stored on the ``cleanupret``
+instruction instead. Successfully executing a catch handler should resume
+normal control flow, so neither ``catchpad`` nor ``catchret`` instructions can
+unwind. All of these "unwind edges" may refer to a basic block that contains an
+EH pad instruction, or they may unwind to the caller.  Unwinding to the caller
+has roughly the same semantics as the ``resume`` instruction in the landingpad
+model. When inlining through an invoke, instructions that unwind to the caller
+are hooked up to unwind to the unwind destination of the call site.
+
+Putting things together, here is a hypothetical lowering of some C++ that uses
+all of the new IR instructions:
+
+.. code-block:: c
+
+  struct Cleanup {
+    Cleanup();
+    ~Cleanup();
+    int m;
+  };
+  void may_throw();
+  int f() noexcept {
+    try {
+      Cleanup obj;
+      may_throw();
+    } catch (int e) {
+      may_throw();
+      return e;
+    }
+    return 0;
+  }
+
+.. code-block:: llvm
+
+  define i32 @f() nounwind personality i32 (...)* @__CxxFrameHandler3 {
+  entry:
+    %obj = alloca %struct.Cleanup, align 4
+    %e = alloca i32, align 4
+    %call = invoke %struct.Cleanup* @"\01??0Cleanup@@QEAA@XZ"(%struct.Cleanup* nonnull %obj)
+            to label %invoke.cont unwind label %lpad.catch
+
+  invoke.cont:                                      ; preds = %entry
+    invoke void @"\01?may_throw@@YAXXZ"()
+            to label %invoke.cont.2 unwind label %lpad.cleanup
+
+  invoke.cont.2:                                    ; preds = %invoke.cont
+    call void @"\01??_DCleanup@@QEAA@XZ"(%struct.Cleanup* nonnull %obj) nounwind
+    br label %return
+
+  return:                                           ; preds = %invoke.cont.3, %invoke.cont.2
+    %retval.0 = phi i32 [ 0, %invoke.cont.2 ], [ %3, %invoke.cont.3 ]
+    ret i32 %retval.0
+
+  lpad.cleanup:                                     ; preds = %invoke.cont.2
+    %0 = cleanuppad within none []
+    call void @"\01??1Cleanup@@QEAA@XZ"(%struct.Cleanup* nonnull %obj) nounwind
+    cleanupret %0 unwind label %lpad.catch
+
+  lpad.catch:                                       ; preds = %lpad.cleanup, %entry
+    %1 = catchswitch within none [label %catch.body] unwind label %lpad.terminate
+
+  catch.body:                                       ; preds = %lpad.catch
+    %catch = catchpad within %1 [%rtti.TypeDescriptor2* @"\01??_R0H@8", i32 0, i32* %e]
+    invoke void @"\01?may_throw@@YAXXZ"()
+            to label %invoke.cont.3 unwind label %lpad.terminate
+
+  invoke.cont.3:                                    ; preds = %catch.body
+    %3 = load i32, i32* %e, align 4
+    catchret from %catch to label %return
+
+  lpad.terminate:                                   ; preds = %catch.body, %lpad.catch
+    cleanuppad within none []
+    call void @"\01?terminate@@YAXXZ"
+    unreachable
+  }
+
+Funclet parent tokens
+-----------------------
+
+In order to produce tables for EH personalities that use funclets, it is
+necessary to recover the nesting that was present in the source. This funclet
+parent relationship is encoded in the IR using tokens produced by the new "pad"
+instructions. The token operand of a "pad" or "ret" instruction indicates which
+funclet it is in, or "none" if it is not nested within another funclet.
+
+The ``catchpad`` and ``cleanuppad`` instructions establish new funclets, and
+their tokens are consumed by other "pad" instructions to establish membership.
+The ``catchswitch`` instruction does not create a funclet, but it produces a
+token that is always consumed by its immediate successor ``catchpad``
+instructions. This ensures that every catch handler modelled by a ``catchpad``
+belongs to exactly one ``catchswitch``, which models the dispatch point after a
+C++ try.
+
+Here is an example of what this nesting looks like using some hypothetical
+C++ code:
+
+.. code-block:: c
+
+  void f() {
+    try {
+      throw;
+    } catch (...) {
+      try {
+        throw;
+      } catch (...) {
+      }
+    }
+  }
+
+.. code-block:: llvm
+
+  define void @f() #0 personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*) {
+  entry:
+    invoke void @_CxxThrowException(i8* null, %eh.ThrowInfo* null) #1
+            to label %unreachable unwind label %catch.dispatch
+
+  catch.dispatch:                                   ; preds = %entry
+    %0 = catchswitch within none [label %catch] unwind to caller
+
+  catch:                                            ; preds = %catch.dispatch
+    %1 = catchpad within %0 [i8* null, i32 64, i8* null]
+    invoke void @_CxxThrowException(i8* null, %eh.ThrowInfo* null) #1
+            to label %unreachable unwind label %catch.dispatch2
+
+  catch.dispatch2:                                  ; preds = %catch
+    %2 = catchswitch within %1 [label %catch3] unwind to caller
+
+  catch3:                                           ; preds = %catch.dispatch2
+    %3 = catchpad within %2 [i8* null, i32 64, i8* null]
+    catchret from %3 to label %try.cont
+
+  try.cont:                                         ; preds = %catch3
+    catchret from %1 to label %try.cont6
+
+  try.cont6:                                        ; preds = %try.cont
+    ret void
+
+  unreachable:                                      ; preds = %catch, %entry
+    unreachable
+  }
+
+The "inner" ``catchswitch`` consumes ``%1`` which is produced by the outer
+catchswitch.
diff --git a/docs/ExtendingLLVM.rst b/docs/ExtendingLLVM.rst
index 3fd54c8360e5..87f48c993425 100644
--- a/docs/ExtendingLLVM.rst
+++ b/docs/ExtendingLLVM.rst
@@ -49,9 +49,9 @@ function and then be turned into an instruction if warranted.
 
    Add an entry for your intrinsic.  Describe its memory access characteristics
    for optimization (this controls whether it will be DCE'd, CSE'd, etc). Note
-   that any intrinsic using the ``llvm_int_ty`` type for an argument will
-   be deemed by ``tblgen`` as overloaded and the corresponding suffix will
-   be required on the intrinsic's name.
+   that any intrinsic using one of the ``llvm_any*_ty`` types for an argument or
+   return type will be deemed by ``tblgen`` as overloaded and the corresponding
+   suffix will be required on the intrinsic's name.
 
 #. ``llvm/lib/Analysis/ConstantFolding.cpp``:
 
diff --git a/docs/Frontend/PerformanceTips.rst b/docs/Frontend/PerformanceTips.rst
index 8d0abcd1c172..142d262eb657 100644
--- a/docs/Frontend/PerformanceTips.rst
+++ b/docs/Frontend/PerformanceTips.rst
@@ -11,12 +11,60 @@ Abstract
 
 The intended audience of this document is developers of language frontends 
 targeting LLVM IR. This document is home to a collection of tips on how to 
-generate IR that optimizes well.  As with any optimizer, LLVM has its strengths
-and weaknesses.  In some cases, surprisingly small changes in the source IR 
-can have a large effect on the generated code.  
+generate IR that optimizes well.  
+
+IR Best Practices
+=================
+
+As with any optimizer, LLVM has its strengths and weaknesses.  In some cases, 
+surprisingly small changes in the source IR can have a large effect on the 
+generated code.  
+
+Beyond the specific items on the list below, it's worth noting that the most 
+mature frontend for LLVM is Clang.  As a result, the further your IR gets from what Clang might emit, the less likely it is to be effectively optimized.  It 
+can often be useful to write a quick C program with the semantics you're trying
+to model and see what decisions Clang's IRGen makes about what IR to emit.  
+Studying Clang's CodeGen directory can also be a good source of ideas.  Note 
+that Clang and LLVM are explicitly version locked so you'll need to make sure 
+you're using a Clang built from the same svn revision or release as the LLVM 
+library you're using.  As always, it's *strongly* recommended that you track 
+tip of tree development, particularly during bring up of a new project.
+
+The Basics
+^^^^^^^^^^^
+
+#. Make sure that your Modules contain both a data layout specification and 
+   target triple. Without these pieces, non of the target specific optimization
+   will be enabled.  This can have a major effect on the generated code quality.
+
+#. For each function or global emitted, use the most private linkage type
+   possible (private, internal or linkonce_odr preferably).  Doing so will 
+   make LLVM's inter-procedural optimizations much more effective.
+
+#. Avoid high in-degree basic blocks (e.g. basic blocks with dozens or hundreds
+   of predecessors).  Among other issues, the register allocator is known to 
+   perform badly with confronted with such structures.  The only exception to 
+   this guidance is that a unified return block with high in-degree is fine.
+
+Use of allocas
+^^^^^^^^^^^^^^
+
+An alloca instruction can be used to represent a function scoped stack slot, 
+but can also represent dynamic frame expansion.  When representing function 
+scoped variables or locations, placing alloca instructions at the beginning of 
+the entry block should be preferred.   In particular, place them before any 
+call instructions. Call instructions might get inlined and replaced with 
+multiple basic blocks. The end result is that a following alloca instruction 
+would no longer be in the entry basic block afterward.
+
+The SROA (Scalar Replacement Of Aggregates) and Mem2Reg passes only attempt
+to eliminate alloca instructions that are in the entry basic block.  Given 
+SSA is the canonical form expected by much of the optimizer; if allocas can 
+not be eliminated by Mem2Reg or SROA, the optimizer is likely to be less 
+effective than it could be.
 
 Avoid loads and stores of large aggregate type
-================================================
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 LLVM currently does not optimize well loads and stores of large :ref:`aggregate
 types <t_aggregate>` (i.e. structs and arrays).  As an alternative, consider 
@@ -27,7 +75,7 @@ instruction supported by the targeted hardware are well supported.  These can
 be an effective way to represent collections of small packed fields.  
 
 Prefer zext over sext when legal
-==================================
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 On some architectures (X86_64 is one), sign extension can involve an extra 
 instruction whereas zero extension can be folded into a load.  LLVM will try to
@@ -39,7 +87,7 @@ Alternatively, you can :ref:`specify the range of the value using metadata
 <range-metadata>` and LLVM can do the sext to zext conversion for you.
 
 Zext GEP indices to machine register width
-============================================
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Internally, LLVM often promotes the width of GEP indices to machine register
 width.  When it does so, it will default to using sign extension (sext) 
@@ -47,47 +95,37 @@ operations for safety.  If your source language provides information about
 the range of the index, you may wish to manually extend indices to machine 
 register width using a zext instruction.
 
-Other things to consider
-=========================
-
-#. Make sure that a DataLayout is provided (this will likely become required in
-   the near future, but is certainly important for optimization).
-
-#. Add nsw/nuw flags as appropriate.  Reasoning about overflow is 
-   generally hard for an optimizer so providing these facts from the frontend 
-   can be very impactful.  
-
-#. Use fast-math flags on floating point operations if legal.  If you don't 
-   need strict IEEE floating point semantics, there are a number of additional 
-   optimizations that can be performed.  This can be highly impactful for 
-   floating point intensive computations.
-
-#. Use inbounds on geps.  This can help to disambiguate some aliasing queries.
-
-#. Add noalias/align/dereferenceable/nonnull to function arguments and return 
-   values as appropriate
-
-#. Mark functions as readnone/readonly or noreturn/nounwind when known.  The 
-   optimizer will try to infer these flags, but may not always be able to.  
-   Manual annotations are particularly important for external functions that 
-   the optimizer can not analyze.
+When to specify alignment
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+LLVM will always generate correct code if you don’t specify alignment, but may
+generate inefficient code.  For example, if you are targeting MIPS (or older 
+ARM ISAs) then the hardware does not handle unaligned loads and stores, and 
+so you will enter a trap-and-emulate path if you do a load or store with 
+lower-than-natural alignment.  To avoid this, LLVM will emit a slower 
+sequence of loads, shifts and masks (or load-right + load-left on MIPS) for 
+all cases where the load / store does not have a sufficiently high alignment 
+in the IR.
+
+The alignment is used to guarantee the alignment on allocas and globals, 
+though in most cases this is unnecessary (most targets have a sufficiently 
+high default alignment that they’ll be fine).  It is also used to provide a 
+contract to the back end saying ‘either this load/store has this alignment, or
+it is undefined behavior’.  This means that the back end is free to emit 
+instructions that rely on that alignment (and mid-level optimizers are free to 
+perform transforms that require that alignment).  For x86, it doesn’t make 
+much difference, as almost all instructions are alignment-independent.  For 
+MIPS, it can make a big difference.
+
+Note that if your loads and stores are atomic, the backend will be unable to 
+lower an under aligned access into a sequence of natively aligned accesses.  
+As a result, alignment is mandatory for atomic loads and stores.
+
+Other Things to Consider
+^^^^^^^^^^^^^^^^^^^^^^^^
 
 #. Use ptrtoint/inttoptr sparingly (they interfere with pointer aliasing 
    analysis), prefer GEPs
 
-#. Use the lifetime.start/lifetime.end and invariant.start/invariant.end 
-   intrinsics where possible.  Common profitable uses are for stack like data 
-   structures (thus allowing dead store elimination) and for describing 
-   life times of allocas (thus allowing smaller stack sizes).  
-
-#. Use pointer aliasing metadata, especially tbaa metadata, to communicate 
-   otherwise-non-deducible pointer aliasing facts
-
-#. Use the "most-private" possible linkage types for the functions being defined
-   (private, internal or linkonce_odr preferably)
-
-#. Mark invariant locations using !invariant.load and TBAA's constant flags
-
 #. Prefer globals over inttoptr of a constant address - this gives you 
    dereferencability information.  In MCJIT, use getSymbolAddress to provide 
    actual address.
@@ -104,15 +142,6 @@ Other things to consider
    desired.  This is generally not required because the optimizer will convert
    an invoke with an unreachable unwind destination to a call instruction.
 
-#. If you language uses range checks, consider using the IRCE pass.  It is not 
-   currently part of the standard pass order.
-
-#. For languages with numerous rarely executed guard conditions (e.g. null 
-   checks, type checks, range checks) consider adding an extra execution or 
-   two of LoopUnswith and LICM to your pass order.  The standard pass order, 
-   which is tuned for C and C++ applications, may not be sufficient to remove 
-   all dischargeable checks from loops.
-
 #. Use profile metadata to indicate statically known cold paths, even if 
    dynamic profiling information is not available.  This can make a large 
    difference in code placement and thus the performance of tight loops.
@@ -136,11 +165,6 @@ Other things to consider
    improvement.  Note that this is not always profitable and does involve a 
    potentially large increase in code size.
 
-#. Avoid high in-degree basic blocks (e.g. basic blocks with dozens or hundreds
-   of predecessors).  Among other issues, the register allocator is known to 
-   perform badly with confronted with such structures.  The only exception to 
-   this guidance is that a unified return block with high in-degree is fine.
-
 #. When checking a value against a constant, emit the check using a consistent
    comparison type.  The GVN pass *will* optimize redundant equalities even if
    the type of comparison is inverted, but GVN only runs late in the pipeline.
@@ -164,10 +188,99 @@ Other things to consider
    time and optimization effectiveness.  The former is fixable with enough 
    effort, but the later is fairly fundamental to their designed purpose.
 
-p.s. If you want to help improve this document, patches expanding any of the 
-above items into standalone sections of their own with a more complete 
-discussion would be very welcome.  
 
+Describing Language Specific Properties
+=======================================
+
+When translating a source language to LLVM, finding ways to express concepts 
+and guarantees available in your source language which are not natively 
+provided by LLVM IR will greatly improve LLVM's ability to optimize your code. 
+As an example, C/C++'s ability to mark every add as "no signed wrap (nsw)" goes
+a long way to assisting the optimizer in reasoning about loop induction 
+variables and thus generating more optimal code for loops.  
+
+The LLVM LangRef includes a number of mechanisms for annotating the IR with 
+additional semantic information.  It is *strongly* recommended that you become 
+highly familiar with this document.  The list below is intended to highlight a 
+couple of items of particular interest, but is by no means exhaustive.
+
+Restricted Operation Semantics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#. Add nsw/nuw flags as appropriate.  Reasoning about overflow is 
+   generally hard for an optimizer so providing these facts from the frontend 
+   can be very impactful.  
+
+#. Use fast-math flags on floating point operations if legal.  If you don't 
+   need strict IEEE floating point semantics, there are a number of additional 
+   optimizations that can be performed.  This can be highly impactful for 
+   floating point intensive computations.
+
+Describing Aliasing Properties
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+#. Add noalias/align/dereferenceable/nonnull to function arguments and return 
+   values as appropriate
+
+#. Use pointer aliasing metadata, especially tbaa metadata, to communicate 
+   otherwise-non-deducible pointer aliasing facts
+
+#. Use inbounds on geps.  This can help to disambiguate some aliasing queries.
+
+
+Modeling Memory Effects
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+#. Mark functions as readnone/readonly/argmemonly or noreturn/nounwind when
+   known.  The optimizer will try to infer these flags, but may not always be
+   able to.  Manual annotations are particularly important for external 
+   functions that the optimizer can not analyze.
+
+#. Use the lifetime.start/lifetime.end and invariant.start/invariant.end 
+   intrinsics where possible.  Common profitable uses are for stack like data 
+   structures (thus allowing dead store elimination) and for describing 
+   life times of allocas (thus allowing smaller stack sizes).  
+
+#. Mark invariant locations using !invariant.load and TBAA's constant flags
+
+Pass Ordering
+^^^^^^^^^^^^^
+
+One of the most common mistakes made by new language frontend projects is to 
+use the existing -O2 or -O3 pass pipelines as is.  These pass pipelines make a
+good starting point for an optimizing compiler for any language, but they have 
+been carefully tuned for C and C++, not your target language.  You will almost 
+certainly need to use a custom pass order to achieve optimal performance.  A 
+couple specific suggestions:
+
+#. For languages with numerous rarely executed guard conditions (e.g. null 
+   checks, type checks, range checks) consider adding an extra execution or 
+   two of LoopUnswith and LICM to your pass order.  The standard pass order, 
+   which is tuned for C and C++ applications, may not be sufficient to remove 
+   all dischargeable checks from loops.
+
+#. If you language uses range checks, consider using the IRCE pass.  It is not 
+   currently part of the standard pass order.
+
+#. A useful sanity check to run is to run your optimized IR back through the 
+   -O2 pipeline again.  If you see noticeable improvement in the resulting IR, 
+   you likely need to adjust your pass order.
+
+
+I Still Can't Find What I'm Looking For
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you didn't find what you were looking for above, consider proposing an piece
+of metadata which provides the optimization hint you need.  Such extensions are
+relatively common and are generally well received by the community.  You will 
+need to ensure that your proposal is sufficiently general so that it benefits 
+others if you wish to contribute it upstream.
+
+You should also consider describing the problem you're facing on `llvm-dev 
+<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ and asking for advice.  
+It's entirely possible someone has encountered your problem before and can 
+give good advice.  If there are multiple interested parties, that also 
+increases the chances that a metadata extension would be well received by the
+community as a whole.  
 
 Adding to this document
 =======================
diff --git a/docs/GettingStarted.rst b/docs/GettingStarted.rst
index df6bd7bc6ba8..2585ce135ba6 100644
--- a/docs/GettingStarted.rst
+++ b/docs/GettingStarted.rst
@@ -1,5 +1,5 @@
 ====================================
-Getting Started with the LLVM System  
+Getting Started with the LLVM System
 ====================================
 
 .. contents::
@@ -49,12 +49,25 @@ Here's the short story for getting up and running quickly with LLVM:
    * ``cd llvm/tools``
    * ``svn co http://llvm.org/svn/llvm-project/cfe/trunk clang``
 
-#. Checkout Compiler-RT:
+#. Checkout Compiler-RT (required to build the sanitizers):
 
    * ``cd where-you-want-llvm-to-live``
    * ``cd llvm/projects``
    * ``svn co http://llvm.org/svn/llvm-project/compiler-rt/trunk compiler-rt``
 
+#. Checkout Libomp (required for OpenMP support):
+
+   * ``cd where-you-want-llvm-to-live``
+   * ``cd llvm/projects``
+   * ``svn co http://llvm.org/svn/llvm-project/openmp/trunk openmp``
+
+#. Checkout libcxx and libcxxabi **[Optional]**:
+
+   * ``cd where-you-want-llvm-to-live``
+   * ``cd llvm/projects``
+   * ``svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx``
+   * ``svn co http://llvm.org/svn/llvm-project/libcxxabi/trunk libcxxabi``
+
 #. Get the Test Suite Source Code **[Optional]**
 
    * ``cd where-you-want-llvm-to-live``
@@ -62,7 +75,7 @@ Here's the short story for getting up and running quickly with LLVM:
    * ``svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite``
 
 #. Configure and build LLVM and Clang:
-   
+
    The usual build uses `CMake <CMake.html>`_. If you would rather use
    autotools, see `Building LLVM with autotools <BuildingLLVMWithAutotools.html>`_.
 
@@ -70,16 +83,16 @@ Here's the short story for getting up and running quickly with LLVM:
    * ``mkdir build``
    * ``cd build``
    * ``cmake -G <generator> [options] <path to llvm sources>``
-     
+
      Some common generators are:
 
      * ``Unix Makefiles`` --- for generating make-compatible parallel makefiles.
      * ``Ninja`` --- for generating `Ninja <http://martine.github.io/ninja/>`
-        build files.
+        build files. Most llvm developers use Ninja.
      * ``Visual Studio`` --- for generating Visual Studio projects and
         solutions.
      * ``Xcode`` --- for generating Xcode projects.
-     
+
      Some Common options:
 
      * ``-DCMAKE_INSTALL_PREFIX=directory`` --- Specify for *directory* the full
@@ -125,20 +138,20 @@ Hardware
 LLVM is known to work on the following host platforms:
 
 ================== ===================== =============
-OS                 Arch                  Compilers               
+OS                 Arch                  Compilers
 ================== ===================== =============
-Linux              x86\ :sup:`1`         GCC, Clang              
-Linux              amd64                 GCC, Clang              
-Linux              ARM\ :sup:`4`         GCC, Clang              
-Linux              PowerPC               GCC, Clang              
-Solaris            V9 (Ultrasparc)       GCC                     
-FreeBSD            x86\ :sup:`1`         GCC, Clang              
-FreeBSD            amd64                 GCC, Clang              
-MacOS X\ :sup:`2`  PowerPC               GCC                     
-MacOS X            x86                   GCC, Clang              
-Cygwin/Win32       x86\ :sup:`1, 3`      GCC                     
-Windows            x86\ :sup:`1`         Visual Studio           
-Windows x64        x86-64                Visual Studio           
+Linux              x86\ :sup:`1`         GCC, Clang
+Linux              amd64                 GCC, Clang
+Linux              ARM\ :sup:`4`         GCC, Clang
+Linux              PowerPC               GCC, Clang
+Solaris            V9 (Ultrasparc)       GCC
+FreeBSD            x86\ :sup:`1`         GCC, Clang
+FreeBSD            amd64                 GCC, Clang
+MacOS X\ :sup:`2`  PowerPC               GCC
+MacOS X            x86                   GCC, Clang
+Cygwin/Win32       x86\ :sup:`1, 3`      GCC
+Windows            x86\ :sup:`1`         Visual Studio
+Windows x64        x86-64                Visual Studio
 ================== ===================== =============
 
 .. note::
@@ -207,14 +220,14 @@ Unix utilities. Specifically:
 * **chmod** --- change permissions on a file
 * **cat** --- output concatenation utility
 * **cp** --- copy files
-* **date** --- print the current date/time 
+* **date** --- print the current date/time
 * **echo** --- print to standard output
 * **egrep** --- extended regular expression search utility
 * **find** --- find files/dirs in a file system
 * **grep** --- regular expression search utility
 * **gzip** --- gzip command for distribution generation
 * **gunzip** --- gunzip command for distribution checking
-* **install** --- install directories/files 
+* **install** --- install directories/files
 * **mkdir** --- create a directory
 * **mv** --- move (rename) files
 * **ranlib** --- symbol table builder for archive libraries
@@ -521,13 +534,28 @@ If you want to check out clang too, run:
   % cd llvm/tools
   % git clone http://llvm.org/git/clang.git
 
-If you want to check out compiler-rt too, run:
+If you want to check out compiler-rt (required to build the sanitizers), run:
 
 .. code-block:: console
 
   % cd llvm/projects
   % git clone http://llvm.org/git/compiler-rt.git
 
+If you want to check out libomp (required for OpenMP support), run:
+
+.. code-block:: console
+
+  % cd llvm/projects
+  % git clone http://llvm.org/git/openmp.git
+
+If you want to check out libcxx and libcxxabi (optional), run:
+
+.. code-block:: console
+
+  % cd llvm/projects
+  % git clone http://llvm.org/git/libcxx.git
+  % git clone http://llvm.org/git/libcxxabi.git
+
 If you want to check out the Test Suite Source Code (optional), run:
 
 .. code-block:: console
@@ -619,7 +647,7 @@ To set up clone from which you can submit code using ``git-svn``, run:
   % git config svn-remote.svn.fetch :refs/remotes/origin/master
   % git svn rebase -l
 
-Likewise for compiler-rt and test-suite.
+Likewise for compiler-rt, libomp and test-suite.
 
 To update this clone without generating git-svn tags that conflict with the
 upstream Git repo, run:
@@ -633,7 +661,7 @@ upstream Git repo, run:
      git checkout master &&
      git svn rebase -l)
 
-Likewise for compiler-rt and test-suite.
+Likewise for compiler-rt, libomp and test-suite.
 
 This leaves your working directories on their master branches, so you'll need to
 ``checkout`` each working branch individually and ``rebase`` it on top of its
@@ -838,7 +866,7 @@ with the latest Xcode:
 
 .. code-block:: console
 
-  % cmake -G "Ninja" -DCMAKE_OSX_ARCHITECTURES=“armv7;armv7s;arm64"
+  % cmake -G "Ninja" -DCMAKE_OSX_ARCHITECTURES="armv7;armv7s;arm64"
     -DCMAKE_TOOLCHAIN_FILE=<PATH_TO_LLVM>/cmake/platforms/iOS.cmake
     -DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_RUNTIME=Off -DLLVM_INCLUDE_TESTS=Off
     -DLLVM_INCLUDE_EXAMPLES=Off -DLLVM_ENABLE_BACKTRACES=Off [options]
@@ -881,7 +909,7 @@ Underneath that directory there is another directory with a name ending in
 For example:
 
   .. code-block:: console
-  
+
     % cd llvm_build_dir
     % find lib/Support/ -name APFloat*
     lib/Support/CMakeFiles/LLVMSupport.dir/APFloat.cpp.o
@@ -990,7 +1018,7 @@ different `tools`_.
   code generation.  For example, the ``llvm/lib/Target/X86`` directory holds the
   X86 machine description while ``llvm/lib/Target/ARM`` implements the ARM
   backend.
-    
+
 ``llvm/lib/CodeGen/``
 
   This directory contains the major parts of the code generator: Instruction
@@ -1075,7 +1103,7 @@ the `Command Guide <CommandGuide/index.html>`_.
 
   The archiver produces an archive containing the given LLVM bitcode files,
   optionally with an index for faster lookup.
-  
+
 ``llvm-as``
 
   The assembler transforms the human readable LLVM assembly to LLVM bitcode.
@@ -1088,7 +1116,7 @@ the `Command Guide <CommandGuide/index.html>`_.
 
   ``llvm-link``, not surprisingly, links multiple LLVM modules into a single
   program.
-  
+
 ``lli``
 
   ``lli`` is the LLVM interpreter, which can directly execute LLVM bitcode
@@ -1219,7 +1247,7 @@ Example with clang
    .. code-block:: console
 
       % ./hello
- 
+
    and
 
    .. code-block:: console
diff --git a/docs/HowToBuildOnARM.rst b/docs/HowToBuildOnARM.rst
index 6579d36a72a6..356c846d82bc 100644
--- a/docs/HowToBuildOnARM.rst
+++ b/docs/HowToBuildOnARM.rst
@@ -18,33 +18,44 @@ Here are some notes on building/testing LLVM/Clang on ARM. Note that
 ARM encompasses a wide variety of CPUs; this advice is primarily based
 on the ARMv6 and ARMv7 architectures and may be inapplicable to older chips.
 
-#. If you are building LLVM/Clang on an ARM board with 1G of memory or less,
-   please use ``gold`` rather then GNU ``ld``.
-   Building LLVM/Clang with ``--enable-optimized``
-   is preferred since it consumes less memory. Otherwise, the building
-   process will very likely fail due to insufficient memory. In any
-   case it is probably a good idea to set up a swap partition.
+#. The most popular Linaro/Ubuntu OS's for ARM boards, e.g., the
+   Pandaboard, have become hard-float platforms. There are a number of
+   choices when using CMake. Autoconf usage is deprecated as of 3.8.
 
-#. If you want to run ``make check-all`` after building LLVM/Clang, to avoid
-   false alarms (e.g., ARCMT failure) please use at least the following
-   configuration:
+   Building LLVM/Clang in ``Relese`` mode is preferred since it consumes
+   a lot less memory. Otherwise, the building process will very likely
+   fail due to insufficient memory. It's also a lot quicker to only build
+   the relevant back-ends (ARM and AArch64), since it's very unlikely that
+   you'll use an ARM board to cross-compile to other arches. If you're
+   running Compiler-RT tests, also include the x86 back-end, or some tests
+   will fail.
 
    .. code-block:: bash
 
-     $ ../$LLVM_SRC_DIR/configure --with-abi=aapcs-vfp
+     cmake $LLVM_SRC_DIR -DCMAKE_BUILD_TYPE=Release \
+                         -DLLVM_TARGETS_TO_BUILD="ARM;X86;AArch64"
 
-#. The most popular Linaro/Ubuntu OS's for ARM boards, e.g., the
-   Pandaboard, have become hard-float platforms. The following set
-   of configuration options appears to be a good choice for this
-   platform:
+   Other options you can use are:
+
+   .. code-block:: bash
+
+     Use Ninja instead of Make: "-G Ninja"
+     Build with assertions on: "-DLLVM_ENABLE_ASSERTIONS=True"
+     Force Python2: "-DPYTHON_EXECUTABLE=/usr/bin/python2"
+     Local (non-sudo) install path: "-DCMAKE_INSTALL_PREFIX=$HOME/llvm/instal"
+     CPU flags: "DCMAKE_C_FLAGS=-mcpu=cortex-a15" (same for CXX_FLAGS)
+
+   After that, just typing ``make -jN`` or ``ninja`` will build everything.
+   ``make -jN check-all`` or ``ninja check-all`` will run all compiler tests. For
+   running the test suite, please refer to :doc:`TestingGuide`.
+
+#. If you are building LLVM/Clang on an ARM board with 1G of memory or less,
+   please use ``gold`` rather then GNU ``ld``. In any case it is probably a good
+   idea to set up a swap partition, too.
 
    .. code-block:: bash
 
-     ../$LLVM_SRC_DIR/configure --build=armv7l-unknown-linux-gnueabihf \
-     --host=armv7l-unknown-linux-gnueabihf \
-     --target=armv7l-unknown-linux-gnueabihf --with-cpu=cortex-a9 \
-     --with-float=hard --with-abi=aapcs-vfp --with-fpu=neon \
-     --enable-targets=arm --enable-optimized --enable-assertions
+     $ sudo ln -sf /usr/bin/ld /usr/bin/ld.gold
 
 #. ARM development boards can be unstable and you may experience that cores
    are disappearing, caches being flushed on every big.LITTLE switch, and
@@ -58,6 +69,10 @@ on the ARMv6 and ARMv7 architectures and may be inapplicable to older chips.
           sudo cpufreq-set -c $cpu -g performance
       done
 
+   Remember to turn that off after the build, or you may risk burning your
+   CPU. Most modern kernels don't need that, so only use it if you have
+   problems.
+
 #. Running the build on SD cards is ok, but they are more prone to failures
    than good quality USB sticks, and those are more prone to failures than
    external hard-drives (those are also a lot faster). So, at least, you
@@ -66,4 +81,5 @@ on the ARMv6 and ARMv7 architectures and may be inapplicable to older chips.
 
 #. Make sure you have a decent power supply (dozens of dollars worth) that can
    provide *at least* 4 amperes, this is especially important if you use USB
-   devices with your board.
+   devices with your board. Externally powered USB/SATA harddrives are even
+   better than having a good power supply.
diff --git a/docs/HowToReleaseLLVM.rst b/docs/HowToReleaseLLVM.rst
index 26e9f3b2ee87..33c547e97a88 100644
--- a/docs/HowToReleaseLLVM.rst
+++ b/docs/HowToReleaseLLVM.rst
@@ -136,51 +136,24 @@ Regenerate the configure scripts for both ``llvm`` and the ``test-suite``.
 In addition, the version numbers of all the Bugzilla components must be updated
 for the next release.
 
-Build the LLVM Release Candidates
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Tagging the LLVM Release Candidates
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Create release candidates for ``llvm``, ``clang``, ``dragonegg``, and the LLVM
-``test-suite`` by tagging the branch with the respective release candidate
-number.  For instance, to create **Release Candidate 1** you would issue the
-following commands:
+Tag release candidates using the tag.sh script in utils/release.
 
 ::
 
-  $ svn mkdir https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XYZ
-  $ svn copy https://llvm.org/svn/llvm-project/llvm/branches/release_XY \
-             https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XYZ/rc1
-
-  $ svn mkdir https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XYZ
-  $ svn copy https://llvm.org/svn/llvm-project/cfe/branches/release_XY \
-             https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XYZ/rc1
-
-  $ svn mkdir https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XYZ
-  $ svn copy https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY \
-             https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XYZ/rc1
-
-  $ svn mkdir https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XYZ
-  $ svn copy https://llvm.org/svn/llvm-project/test-suite/branches/release_XY \
-             https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XYZ/rc1
-
-Similarly, **Release Candidate 2** would be named ``RC2`` and so on.  This keeps
-a permanent copy of the release candidate around for people to export and build
-as they wish.  The final released sources will be tagged in the ``RELEASE_XYZ``
-directory as ``Final`` (c.f. :ref:`tag`).
+  $ ./tag.sh -release X.Y.Z -rc $RC
 
 The Release Manager may supply pre-packaged source tarballs for users.  This can
-be done with the following commands:
+be done with the export.sh script in utils/release.
 
 ::
 
-  $ svn export https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XYZ/rc1 llvm-X.Yrc1
-  $ svn export https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XYZ/rc1 clang-X.Yrc1
-  $ svn export https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XYZ/rc1 dragonegg-X.Yrc1
-  $ svn export https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XYZ/rc1 llvm-test-X.Yrc1
+  $ ./export.sh -release X.Y.Z -rc $RC
 
-  $ tar -cvf - llvm-X.Yrc1        | gzip > llvm-X.Yrc1.src.tar.gz
-  $ tar -cvf - clang-X.Yrc1       | gzip > clang-X.Yrc1.src.tar.gz
-  $ tar -cvf - dragonegg-X.Yrc1   | gzip > dragonegg-X.Yrc1.src.tar.gz
-  $ tar -cvf - llvm-test-X.Yrc1   | gzip > llvm-test-X.Yrc1.src.tar.gz
+This will generate source tarballs for each LLVM project being validated, which
+can be uploaded to the website for further testing.
 
 Building the Release
 --------------------
@@ -384,21 +357,11 @@ mainline into the release branch.
 Tag the LLVM Final Release
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Tag the final release sources using the following procedure:
+Tag the final release sources using the tag.sh script in utils/release.
 
 ::
 
-  $ svn copy https://llvm.org/svn/llvm-project/llvm/branches/release_XY \
-             https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XYZ/Final
-
-  $ svn copy https://llvm.org/svn/llvm-project/cfe/branches/release_XY \
-             https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XYZ/Final
-
-  $ svn copy https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY \
-             https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XYZ/Final
-
-  $ svn copy https://llvm.org/svn/llvm-project/test-suite/branches/release_XY \
-             https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XYZ/Final
+  $ ./tag.sh -release X.Y.Z -final
 
 Update the LLVM Demo Page
 -------------------------
diff --git a/docs/LangRef.rst b/docs/LangRef.rst
index 0039d014275a..103d876b3cef 100644
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@@ -83,7 +83,7 @@ identifiers, for different purposes:
    can be used on global variables to suppress mangling.
 #. Unnamed values are represented as an unsigned numeric value with
    their prefix. For example, ``%12``, ``@2``, ``%44``.
-#. Constants, which are described in the section  Constants_ below.
+#. Constants, which are described in the section Constants_ below.
 
 LLVM requires that values start with a prefix for two reasons: Compilers
 don't need to worry about name clashes with reserved words, and the set
@@ -204,14 +204,15 @@ linkage:
     (``STB_LOCAL`` in the case of ELF) in the object file. This
     corresponds to the notion of the '``static``' keyword in C.
 ``available_externally``
-    Globals with "``available_externally``" linkage are never emitted
-    into the object file corresponding to the LLVM module. They exist to
-    allow inlining and other optimizations to take place given knowledge
-    of the definition of the global, which is known to be somewhere
-    outside the module. Globals with ``available_externally`` linkage
-    are allowed to be discarded at will, and are otherwise the same as
-    ``linkonce_odr``. This linkage type is only allowed on definitions,
-    not declarations.
+    Globals with "``available_externally``" linkage are never emitted into
+    the object file corresponding to the LLVM module. From the linker's
+    perspective, an ``available_externally`` global is equivalent to
+    an external declaration. They exist to allow inlining and other
+    optimizations to take place given knowledge of the definition of the
+    global, which is known to be somewhere outside the module. Globals
+    with ``available_externally`` linkage are allowed to be discarded at
+    will, and allow inlining and other optimizations. This linkage type is
+    only allowed on definitions, not declarations.
 ``linkonce``
     Globals with "``linkonce``" linkage are merged with other globals of
     the same name when linkage occurs. This can be used to implement
@@ -257,7 +258,7 @@ linkage:
     Some languages allow differing globals to be merged, such as two
     functions with different semantics. Other languages, such as
     ``C++``, ensure that only equivalent globals are ever merged (the
-    "one definition rule" --- "ODR").  Such languages can use the
+    "one definition rule" --- "ODR"). Such languages can use the
     ``linkonce_odr`` and ``weak_odr`` linkage types to indicate that the
     global will only be merged with equivalent globals. These linkage
     types are otherwise the same as their non-``odr`` versions.
@@ -406,6 +407,26 @@ added in the future:
     This calling convention, like the `PreserveMost` calling convention, will be
     used by a future version of the ObjectiveC runtime and should be considered
     experimental at this time.
+"``cxx_fast_tlscc``" - The `CXX_FAST_TLS` calling convention for access functions
+    Clang generates an access function to access C++-style TLS. The access
+    function generally has an entry block, an exit block and an initialization
+    block that is run at the first time. The entry and exit blocks can access
+    a few TLS IR variables, each access will be lowered to a platform-specific
+    sequence.
+
+    This calling convention aims to minimize overhead in the caller by
+    preserving as many registers as possible (all the registers that are
+    perserved on the fast path, composed of the entry and exit blocks).
+
+    This calling convention behaves identical to the `C` calling convention on
+    how arguments and return values are passed, but it uses a different set of
+    caller/callee-saved registers.
+
+    Given that each platform has its own lowering sequence, hence its own set
+    of preserved registers, we can't use the existing `PreserveMost`.
+
+    - On X86-64 the callee preserves all general purpose registers, except for
+      RDI and RAX.
 "``cc <n>``" - Numbered convention
     Any calling convention may be specified by number, allowing
     target-specific calling conventions to be used. Target specific
@@ -491,26 +512,29 @@ more information on under which circumstances the different models may
 be used. The target may choose a different TLS model if the specified
 model is not supported, or if a better choice of model can be made.
 
-A model can also be specified in a alias, but then it only governs how
+A model can also be specified in an alias, but then it only governs how
 the alias is accessed. It will not have any effect in the aliasee.
 
+For platforms without linker support of ELF TLS model, the -femulated-tls
+flag can be used to generate GCC compatible emulated TLS code.
+
 .. _namedtypes:
 
 Structure Types
 ---------------
 
 LLVM IR allows you to specify both "identified" and "literal" :ref:`structure
-types <t_struct>`.  Literal types are uniqued structurally, but identified types
-are never uniqued.  An :ref:`opaque structural type <t_opaque>` can also be used
+types <t_struct>`. Literal types are uniqued structurally, but identified types
+are never uniqued. An :ref:`opaque structural type <t_opaque>` can also be used
 to forward declare a type that is not yet available.
 
-An example of a identified structure specification is:
+An example of an identified structure specification is:
 
 .. code-block:: llvm
 
     %mytype = type { %mytype*, i32 }
 
-Prior to the LLVM 3.0 release, identified types were structurally uniqued.  Only
+Prior to the LLVM 3.0 release, identified types were structurally uniqued. Only
 literal types are uniqued in recent versions of LLVM.
 
 .. _globalvars:
@@ -569,7 +593,7 @@ support.
 
 By default, global initializers are optimized by assuming that global
 variables defined within the module are not modified from their
-initial values before the start of the global initializer.  This is
+initial values before the start of the global initializer. This is
 true even for variables potentially accessible from outside the
 module, including those with external linkage or appearing in
 ``@llvm.used`` or dllexported variables. This assumption may be suppressed
@@ -637,6 +661,7 @@ an optional :ref:`comdat <langref_comdats>`,
 an optional :ref:`garbage collector name <gc>`, an optional :ref:`prefix <prefixdata>`,
 an optional :ref:`prologue <prologuedata>`,
 an optional :ref:`personality <personalityfn>`,
+an optional list of attached :ref:`metadata <metadata>`,
 an opening curly brace, a list of basic blocks, and a closing curly brace.
 
 LLVM function declarations consist of the "``declare``" keyword, an
@@ -685,10 +710,10 @@ Syntax::
            <ResultType> @<FunctionName> ([argument list])
            [unnamed_addr] [fn Attrs] [section "name"] [comdat [($name)]]
            [align N] [gc] [prefix Constant] [prologue Constant]
-           [personality Constant] { ... }
+           [personality Constant] (!name !N)* { ... }
 
-The argument list is a comma seperated sequence of arguments where each
-argument is of the following form
+The argument list is a comma separated sequence of arguments where each
+argument is of the following form:
 
 Syntax::
 
@@ -712,7 +737,7 @@ Aliases may have an optional :ref:`linkage type <linkage>`, an optional
 
 Syntax::
 
-    @<Name> = [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal] [unnamed_addr] alias <AliaseeTy> @<Aliasee>
+    @<Name> = [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal] [unnamed_addr] alias <AliaseeTy>, <AliaseeTy>* @<Aliasee>
 
 The linkage must be one of ``private``, ``internal``, ``linkonce``, ``weak``,
 ``linkonce_odr``, ``weak_odr``, ``external``. Note that some system linkers
@@ -742,9 +767,9 @@ Comdats
 
 Comdat IR provides access to COFF and ELF object file COMDAT functionality.
 
-Comdats have a name which represents the COMDAT key.  All global objects that
+Comdats have a name which represents the COMDAT key. All global objects that
 specify this key will only end up in the final object file if the linker chooses
-that key over some other key.  Aliases are placed in the same COMDAT that their
+that key over some other key. Aliases are placed in the same COMDAT that their
 aliasee computes to, if any.
 
 Comdats have a selection kind to provide input on how the linker should
@@ -819,13 +844,13 @@ For example:
    @g2 = global i32 42, section "sec", comdat($bar)
 
 From the object file perspective, this requires the creation of two sections
-with the same name.  This is necessary because both globals belong to different
+with the same name. This is necessary because both globals belong to different
 COMDAT groups and COMDATs, at the object file level, are represented by
 sections.
 
 Note that certain IR constructs like global variables and functions may
 create COMDATs in the object file in addition to any which are specified using
-COMDAT IR.  This arises when the code generator is configured to emit globals
+COMDAT IR. This arises when the code generator is configured to emit globals
 in individual sections (e.g. when `-data-sections` or `-function-sections`
 is supplied to `llc`).
 
@@ -891,7 +916,7 @@ Currently, only the following parameter attributes are defined:
     the callee (for a return value).
 ``inreg``
     This indicates that this parameter or return value should be treated
-    in a special target-dependent fashion during while emitting code for
+    in a special target-dependent fashion while emitting code for
     a function call or return (usually, by putting it in a register as
     opposed to memory, though some targets use it to distinguish between
     two different kinds of registers). Use of this attribute is
@@ -919,23 +944,23 @@ Currently, only the following parameter attributes are defined:
 ``inalloca``
 
     The ``inalloca`` argument attribute allows the caller to take the
-    address of outgoing stack arguments.  An ``inalloca`` argument must
+    address of outgoing stack arguments. An ``inalloca`` argument must
     be a pointer to stack memory produced by an ``alloca`` instruction.
     The alloca, or argument allocation, must also be tagged with the
-    inalloca keyword.  Only the last argument may have the ``inalloca``
+    inalloca keyword. Only the last argument may have the ``inalloca``
     attribute, and that argument is guaranteed to be passed in memory.
 
     An argument allocation may be used by a call at most once because
-    the call may deallocate it.  The ``inalloca`` attribute cannot be
+    the call may deallocate it. The ``inalloca`` attribute cannot be
     used in conjunction with other attributes that affect argument
-    storage, like ``inreg``, ``nest``, ``sret``, or ``byval``.  The
+    storage, like ``inreg``, ``nest``, ``sret``, or ``byval``. The
     ``inalloca`` attribute also disables LLVM's implicit lowering of
     large aggregate return values, which means that frontend authors
     must lower them with ``sret`` pointers.
 
     When the call site is reached, the argument allocation must have
     been the most recent stack allocation that is still live, or the
-    results are undefined.  It is possible to allocate additional stack
+    results are undefined. It is possible to allocate additional stack
     space after an argument allocation and before its call site, but it
     must be cleared off with :ref:`llvm.stackrestore
     <int_stackrestore>`.
@@ -1024,14 +1049,14 @@ Currently, only the following parameter attributes are defined:
 ``dereferenceable_or_null(<n>)``
     This indicates that the parameter or return value isn't both
     non-null and non-dereferenceable (up to ``<n>`` bytes) at the same
-    time.  All non-null pointers tagged with
+    time. All non-null pointers tagged with
     ``dereferenceable_or_null(<n>)`` are ``dereferenceable(<n>)``.
     For address space 0 ``dereferenceable_or_null(<n>)`` implies that
     a pointer is exactly one of ``dereferenceable(<n>)`` or ``null``,
     and in other address spaces ``dereferenceable_or_null(<n>)``
     implies that a pointer is at least one of ``dereferenceable(<n>)``
     or ``null`` (i.e. it may be both ``null`` and
-    ``dereferenceable(<n>)``).  This attribute may only be applied to
+    ``dereferenceable(<n>)``). This attribute may only be applied to
     pointer typed parameters.
 
 .. _gc:
@@ -1047,9 +1072,9 @@ string:
     define void @f() gc "name" { ... }
 
 The supported values of *name* includes those :ref:`built in to LLVM
-<builtin-gc-strategies>` and any provided by loaded plugins.  Specifying a GC
+<builtin-gc-strategies>` and any provided by loaded plugins. Specifying a GC
 strategy will cause the compiler to alter its output in order to support the
-named garbage collection algorithm.  Note that LLVM itself does not contain a
+named garbage collection algorithm. Note that LLVM itself does not contain a
 garbage collector, this functionality is restricted to generating machine code
 which can interoperate with a collector provided externally.
 
@@ -1067,7 +1092,7 @@ function pointer to be called.
 
 To access the data for a given function, a program may bitcast the
 function pointer to a pointer to the constant's type and dereference
-index -1.  This implies that the IR symbol points just past the end of
+index -1. This implies that the IR symbol points just past the end of
 the prefix data. For instance, take the example of a function annotated
 with a single ``i32``,
 
@@ -1084,14 +1109,14 @@ The prefix data can be referenced as,
     %b = load i32, i32* %a
 
 Prefix data is laid out as if it were an initializer for a global variable
-of the prefix data's type.  The function will be placed such that the
+of the prefix data's type. The function will be placed such that the
 beginning of the prefix data is aligned. This means that if the size
 of the prefix data is not a multiple of the alignment size, the
 function's entrypoint will not be aligned. If alignment of the
 function's entrypoint is desired, padding must be added to the prefix
 data.
 
-A function may have prefix data but no body.  This has similar semantics
+A function may have prefix data but no body. This has similar semantics
 to the ``available_externally`` linkage in that the data may be used by the
 optimizers but will not be emitted in the object file.
 
@@ -1105,12 +1130,12 @@ be inserted prior to the function body. This can be used for enabling
 function hot-patching and instrumentation.
 
 To maintain the semantics of ordinary function calls, the prologue data must
-have a particular format.  Specifically, it must begin with a sequence of
+have a particular format. Specifically, it must begin with a sequence of
 bytes which decode to a sequence of machine instructions, valid for the
 module's target, which transfer control to the point immediately succeeding
-the prologue data, without performing any other visible action.  This allows
+the prologue data, without performing any other visible action. This allows
 the inliner and other passes to reason about the semantics of the function
-definition without needing to reason about the prologue data.  Obviously this
+definition without needing to reason about the prologue data. Obviously this
 makes the format of the prologue data highly target dependent.
 
 A trivial example of valid prologue data for the x86 architecture is ``i8 144``,
@@ -1130,7 +1155,7 @@ x86_64 architecture, where the first two bytes encode ``jmp .+10``:
 
     define void @f() prologue %0 <{ i8 235, i8 8, i8* @md}> { ... }
 
-A function may have prologue data but no body.  This has similar semantics
+A function may have prologue data but no body. This has similar semantics
 to the ``available_externally`` linkage in that the data may be used by the
 optimizers but will not be emitted in the object file.
 
@@ -1216,10 +1241,16 @@ example:
 ``convergent``
     This attribute indicates that the callee is dependent on a convergent
     thread execution pattern under certain parallel execution models.
-    Transformations that are execution model agnostic may only move or
-    tranform this call if the final location is control equivalent to its
-    original position in the program, where control equivalence is defined as
-    A dominates B and B post-dominates A, or vice versa.
+    Transformations that are execution model agnostic may not make the execution
+    of a convergent operation control dependent on any additional values.
+``inaccessiblememonly``
+    This attribute indicates that the function may only access memory that
+    is not accessible by the module being compiled. This is a weaker form
+    of ``readnone``.
+``inaccessiblemem_or_argmemonly``
+    This attribute indicates that the function may only access memory that is
+    either not accessible by the module being compiled, or is pointed to
+    by its pointer arguments. This is a weaker form of  ``argmemonly``
 ``inlinehint``
     This attribute indicates that the source code contained a hint that
     inlining this function is desirable (such as the "inline" keyword in
@@ -1275,6 +1306,10 @@ example:
     This function attribute indicates that the function never returns
     normally. This produces undefined behavior at runtime if the
     function ever does dynamically return.
+``norecurse``
+    This function attribute indicates that the function does not call itself
+    either directly or indirectly down any possible call path. This produces
+    undefined behavior at runtime if the function ever does recurse.
 ``nounwind``
     This function attribute indicates that the function never raises an
     exception. If the function does raise an exception, its runtime
@@ -1283,9 +1318,9 @@ example:
     that are recognized by LLVM to handle asynchronous exceptions, such
     as SEH, will still provide their implementation defined semantics.
 ``optnone``
-    This function attribute indicates that the function is not optimized
-    by any optimization or code generator passes with the
-    exception of interprocedural optimization passes.
+    This function attribute indicates that most optimization passes will skip
+    this function, with the exception of interprocedural optimization passes.
+    Code generation defaults to the "fast" instruction selector.
     This attribute cannot be used together with the ``alwaysinline``
     attribute; this attribute is also incompatible
     with the ``minsize`` attribute and the ``optsize`` attribute.
@@ -1399,7 +1434,7 @@ example:
 ``sspstrong``
     This attribute indicates that the function should emit a stack smashing
     protector. This attribute causes a strong heuristic to be used when
-    determining if a function needs stack protectors.  The strong heuristic
+    determining if a function needs stack protectors. The strong heuristic
     will enable protectors for functions with:
 
     - Arrays of any size and type
@@ -1430,11 +1465,129 @@ example:
     match the thunk target prototype.
 ``uwtable``
     This attribute indicates that the ABI being targeted requires that
-    an unwind table entry be produce for this function even if we can
+    an unwind table entry be produced for this function even if we can
     show that no exceptions passes by it. This is normally the case for
     the ELF x86-64 abi, but it can be disabled for some compilation
     units.
 
+
+.. _opbundles:
+
+Operand Bundles
+---------------
+
+Note: operand bundles are a work in progress, and they should be
+considered experimental at this time.
+
+Operand bundles are tagged sets of SSA values that can be associated
+with certain LLVM instructions (currently only ``call`` s and
+``invoke`` s).  In a way they are like metadata, but dropping them is
+incorrect and will change program semantics.
+
+Syntax::
+
+    operand bundle set ::= '[' operand bundle (, operand bundle )* ']'
+    operand bundle ::= tag '(' [ bundle operand ] (, bundle operand )* ')'
+    bundle operand ::= SSA value
+    tag ::= string constant
+
+Operand bundles are **not** part of a function's signature, and a
+given function may be called from multiple places with different kinds
+of operand bundles.  This reflects the fact that the operand bundles
+are conceptually a part of the ``call`` (or ``invoke``), not the
+callee being dispatched to.
+
+Operand bundles are a generic mechanism intended to support
+runtime-introspection-like functionality for managed languages.  While
+the exact semantics of an operand bundle depend on the bundle tag,
+there are certain limitations to how much the presence of an operand
+bundle can influence the semantics of a program.  These restrictions
+are described as the semantics of an "unknown" operand bundle.  As
+long as the behavior of an operand bundle is describable within these
+restrictions, LLVM does not need to have special knowledge of the
+operand bundle to not miscompile programs containing it.
+
+- The bundle operands for an unknown operand bundle escape in unknown
+  ways before control is transferred to the callee or invokee.
+- Calls and invokes with operand bundles have unknown read / write
+  effect on the heap on entry and exit (even if the call target is
+  ``readnone`` or ``readonly``), unless they're overriden with
+  callsite specific attributes.
+- An operand bundle at a call site cannot change the implementation
+  of the called function.  Inter-procedural optimizations work as
+  usual as long as they take into account the first two properties.
+
+More specific types of operand bundles are described below.
+
+Deoptimization Operand Bundles
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Deoptimization operand bundles are characterized by the ``"deopt"``
+operand bundle tag.  These operand bundles represent an alternate
+"safe" continuation for the call site they're attached to, and can be
+used by a suitable runtime to deoptimize the compiled frame at the
+specified call site.  There can be at most one ``"deopt"`` operand
+bundle attached to a call site.  Exact details of deoptimization is
+out of scope for the language reference, but it usually involves
+rewriting a compiled frame into a set of interpreted frames.
+
+From the compiler's perspective, deoptimization operand bundles make
+the call sites they're attached to at least ``readonly``.  They read
+through all of their pointer typed operands (even if they're not
+otherwise escaped) and the entire visible heap.  Deoptimization
+operand bundles do not capture their operands except during
+deoptimization, in which case control will not be returned to the
+compiled frame.
+
+The inliner knows how to inline through calls that have deoptimization
+operand bundles.  Just like inlining through a normal call site
+involves composing the normal and exceptional continuations, inlining
+through a call site with a deoptimization operand bundle needs to
+appropriately compose the "safe" deoptimization continuation.  The
+inliner does this by prepending the parent's deoptimization
+continuation to every deoptimization continuation in the inlined body.
+E.g. inlining ``@f`` into ``@g`` in the following example
+
+.. code-block:: llvm
+
+    define void @f() {
+      call void @x()  ;; no deopt state
+      call void @y() [ "deopt"(i32 10) ]
+      call void @y() [ "deopt"(i32 10), "unknown"(i8* null) ]
+      ret void
+    }
+
+    define void @g() {
+      call void @f() [ "deopt"(i32 20) ]
+      ret void
+    }
+
+will result in
+
+.. code-block:: llvm
+
+    define void @g() {
+      call void @x()  ;; still no deopt state
+      call void @y() [ "deopt"(i32 20, i32 10) ]
+      call void @y() [ "deopt"(i32 20, i32 10), "unknown"(i8* null) ]
+      ret void
+    }
+
+It is the frontend's responsibility to structure or encode the
+deoptimization state in a way that syntactically prepending the
+caller's deoptimization state to the callee's deoptimization state is
+semantically equivalent to composing the caller's deoptimization
+continuation after the callee's deoptimization continuation.
+
+Funclet Operand Bundles
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Funclet operand bundles are characterized by the ``"funclet"``
+operand bundle tag.  These operand bundles indicate that a call site
+is within a particular funclet.  There can be at most one
+``"funclet"`` operand bundle attached to a call site and it must have
+exactly one bundle operand.
+
 .. _moduleasm:
 
 Module-Level Inline Assembly
@@ -1494,8 +1647,8 @@ as follows:
 ``p[n]:<size>:<abi>:<pref>``
     This specifies the *size* of a pointer and its ``<abi>`` and
     ``<pref>``\erred alignments for address space ``n``. All sizes are in
-    bits. The address space, ``n`` is optional, and if not specified,
-    denotes the default address space 0.  The value of ``n`` must be
+    bits. The address space, ``n``, is optional, and if not specified,
+    denotes the default address space 0. The value of ``n`` must be
     in the range [1,2^23).
 ``i<size>:<abi>:<pref>``
     This specifies the alignment for an integer type of a given bit
@@ -1521,6 +1674,8 @@ as follows:
       symbols get a ``_`` prefix.
     * ``w``: Windows COFF prefix:  Similar to Mach-O, but stdcall and fastcall
       functions also get a suffix based on the frame size.
+    * ``x``: Windows x86 COFF prefix:  Similar to Windows COFF, but use a ``_``
+      prefix for ``__cdecl`` functions.
 ``n<size1>:<size2>:<size3>...``
     This specifies a set of native integer widths for the target CPU in
     bits. For example, it might contain ``n32`` for 32-bit PowerPC,
@@ -1687,7 +1842,7 @@ target-legal volatile load/store instructions.
  this holds for an l-value of volatile primitive type with native
  hardware support, but not necessarily for aggregate types. The
  frontend upholds these expectations, which are intentionally
- unspecified in the IR. The rules above ensure that IR transformation
+ unspecified in the IR. The rules above ensure that IR transformations
  do not violate the frontend's contract with the language.
 
 .. _memmodel:
@@ -1877,12 +2032,12 @@ Use-list Order Directives
 -------------------------
 
 Use-list directives encode the in-memory order of each use-list, allowing the
-order to be recreated.  ``<order-indexes>`` is a comma-separated list of
-indexes that are assigned to the referenced value's uses.  The referenced
+order to be recreated. ``<order-indexes>`` is a comma-separated list of
+indexes that are assigned to the referenced value's uses. The referenced
 value's use-list is immediately sorted by these indexes.
 
-Use-list directives may appear at function scope or global scope.  They are not
-instructions, and have no effect on the semantics of the IR.  When they're at
+Use-list directives may appear at function scope or global scope. They are not
+instructions, and have no effect on the semantics of the IR. When they're at
 function scope, they must appear after the terminator of the final basic block.
 
 If basic blocks have their address taken via ``blockaddress()`` expressions,
@@ -1969,9 +2124,9 @@ and :ref:`metadata <t_metadata>` types.
 
 ...where '``<parameter list>``' is a comma-separated list of type
 specifiers. Optionally, the parameter list may include a type ``...``, which
-indicates that the function takes a variable number of arguments.  Variable
+indicates that the function takes a variable number of arguments. Variable
 argument functions can access their arguments with the :ref:`variable argument
-handling intrinsic <int_varargs>` functions.  '``<returntype>``' is any type
+handling intrinsic <int_varargs>` functions. '``<returntype>``' is any type
 except :ref:`label <t_label>` and :ref:`metadata <t_metadata>`.
 
 :Examples:
@@ -2165,6 +2320,26 @@ The label type represents code labels.
 
       label
 
+.. _t_token:
+
+Token Type
+^^^^^^^^^^
+
+:Overview:
+
+The token type is used when a value is associated with an instruction
+but all uses of the value must not attempt to introspect or obscure it.
+As such, it is not appropriate to have a :ref:`phi <i_phi>` or
+:ref:`select <i_select>` of type token.
+
+:Syntax:
+
+::
+
+      token
+
+
+
 .. _t_metadata:
 
 Metadata Type
@@ -2338,6 +2513,9 @@ Simple Constants
 **Null pointer constants**
     The identifier '``null``' is recognized as a null pointer constant
     and must be of :ref:`pointer type <t_pointer>`.
+**Token constants**
+    The identifier '``none``' is recognized as an empty token constant
+    and must be of :ref:`token type <t_token>`.
 
 The one non-intuitive notation for constants is the hexadecimal form of
 floating point constants. For example, the form
@@ -2406,8 +2584,8 @@ constants and smaller complex constants.
     having to print large zero initializers (e.g. for large arrays) and
     is always exactly equivalent to using explicit zero initializers.
 **Metadata node**
-    A metadata node is a constant tuple without types.  For example:
-    "``!{!0, !{!2, !0}, !"test"}``".  Metadata can reference constant values,
+    A metadata node is a constant tuple without types. For example:
+    "``!{!0, !{!2, !0}, !"test"}``". Metadata can reference constant values,
     for example: "``!{!0, i32 0, i8* @global, i64 (i64)* @function, !"str"}``".
     Unlike other typed constants that are meant to be interpreted as part of
     the instruction stream, metadata is a place to attach additional
@@ -3325,7 +3503,7 @@ and GCC likely indicates a bug in LLVM.
 
 Target-independent:
 
-- ``c``: Print an immediate integer constant  unadorned, without
+- ``c``: Print an immediate integer constant unadorned, without
   the target-specific immediate punctuation (e.g. no ``$`` prefix).
 - ``n``: Negate and print immediate integer constant unadorned, without the
   target-specific immediate punctuation (e.g. no ``$`` prefix).
@@ -3505,7 +3683,7 @@ that can convey extra information about the code to the optimizers and
 code generator. One example application of metadata is source-level
 debug information. There are two metadata primitives: strings and nodes.
 
-Metadata does not have a type, and is not a value.  If referenced from a
+Metadata does not have a type, and is not a value. If referenced from a
 ``call`` instruction, it uses the ``metadata`` type.
 
 All metadata are identified in syntax by a exclamation point ('``!``').
@@ -3536,7 +3714,7 @@ Metadata nodes that aren't uniqued use the ``distinct`` keyword. For example:
     !0 = distinct !{!"test\00", i32 10}
 
 ``distinct`` nodes are useful when nodes shouldn't be merged based on their
-content.  They can also occur when transformations cause uniquing collisions
+content. They can also occur when transformations cause uniquing collisions
 when metadata operands change.
 
 A :ref:`named metadata <namedmetadatastructure>` is a collection of
@@ -3554,13 +3732,22 @@ function is using two metadata arguments:
 
     call void @llvm.dbg.value(metadata !24, i64 0, metadata !25)
 
-Metadata can be attached with an instruction. Here metadata ``!21`` is
-attached to the ``add`` instruction using the ``!dbg`` identifier:
+Metadata can be attached to an instruction. Here metadata ``!21`` is attached
+to the ``add`` instruction using the ``!dbg`` identifier:
 
 .. code-block:: llvm
 
     %indvar.next = add i64 %indvar, 1, !dbg !21
 
+Metadata can also be attached to a function definition. Here metadata ``!22``
+is attached to the ``foo`` function using the ``!dbg`` identifier:
+
+.. code-block:: llvm
+
+    define void @foo() !dbg !22 {
+      ret void
+    }
+
 More information about specific metadata nodes recognized by the
 optimizers and code generator is found below.
 
@@ -3570,7 +3757,7 @@ Specialized Metadata Nodes
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Specialized metadata nodes are custom data structures in metadata (as opposed
-to generic tuples).  Their fields are labelled, and can be specified in any
+to generic tuples). Their fields are labelled, and can be specified in any
 order.
 
 These aren't inherently debug info centric, but currently all the specialized
@@ -3581,10 +3768,10 @@ metadata nodes are related to debug info.
 DICompileUnit
 """""""""""""
 
-``DICompileUnit`` nodes represent a compile unit.  The ``enums:``,
-``retainedTypes:``, ``subprograms:``, ``globals:`` and ``imports:`` fields are
-tuples containing the debug info to be emitted along with the compile unit,
-regardless of code optimizations (some nodes are only emitted if there are
+``DICompileUnit`` nodes represent a compile unit. The ``enums:``,
+``retainedTypes:``, ``subprograms:``, ``globals:``, ``imports:`` and ``macros:``
+fields are tuples containing the debug info to be emitted along with the compile
+unit, regardless of code optimizations (some nodes are only emitted if there are
 references to them from instructions).
 
 .. code-block:: llvm
@@ -3593,11 +3780,11 @@ references to them from instructions).
                         isOptimized: true, flags: "-O2", runtimeVersion: 2,
                         splitDebugFilename: "abc.debug", emissionKind: 1,
                         enums: !2, retainedTypes: !3, subprograms: !4,
-                        globals: !5, imports: !6)
+                        globals: !5, imports: !6, macros: !7, dwoId: 0x0abcd)
 
 Compile unit descriptors provide the root scope for objects declared in a
-specific compilation unit.  File descriptors are defined using this scope.
-These descriptors are collected by a named metadata ``!llvm.dbg.cu``.  They
+specific compilation unit. File descriptors are defined using this scope.
+These descriptors are collected by a named metadata ``!llvm.dbg.cu``. They
 keep track of subprograms, global variables, type information, and imported
 entities (declarations and namespaces).
 
@@ -3606,7 +3793,7 @@ entities (declarations and namespaces).
 DIFile
 """"""
 
-``DIFile`` nodes represent files.  The ``filename:`` can include slashes.
+``DIFile`` nodes represent files. The ``filename:`` can include slashes.
 
 .. code-block:: llvm
 
@@ -3621,7 +3808,7 @@ DIBasicType
 """""""""""
 
 ``DIBasicType`` nodes represent primitive types, such as ``int``, ``bool`` and
-``float``.  ``tag:`` defaults to ``DW_TAG_base_type``.
+``float``. ``tag:`` defaults to ``DW_TAG_base_type``.
 
 .. code-block:: llvm
 
@@ -3629,7 +3816,7 @@ DIBasicType
                       encoding: DW_ATE_unsigned_char)
     !1 = !DIBasicType(tag: DW_TAG_unspecified_type, name: "decltype(nullptr)")
 
-The ``encoding:`` describes the details of the type.  Usually it's one of the
+The ``encoding:`` describes the details of the type. Usually it's one of the
 following:
 
 .. code-block:: llvm
@@ -3647,9 +3834,9 @@ following:
 DISubroutineType
 """"""""""""""""
 
-``DISubroutineType`` nodes represent subroutine types.  Their ``types:`` field
+``DISubroutineType`` nodes represent subroutine types. Their ``types:`` field
 refers to a tuple; the first operand is the return type, while the rest are the
-types of the formal arguments in order.  If the first operand is ``null``, that
+types of the formal arguments in order. If the first operand is ``null``, that
 represents a function with no return value (such as ``void foo() {}`` in C++).
 
 .. code-block:: llvm
@@ -3688,8 +3875,8 @@ The following ``tag:`` values are valid:
   DW_TAG_restrict_type      = 55
 
 ``DW_TAG_member`` is used to define a member of a :ref:`composite type
-<DICompositeType>` or :ref:`subprogram <DISubprogram>`.  The type of the member
-is the ``baseType:``.  The ``offset:`` is the member's bit offset.
+<DICompositeType>` or :ref:`subprogram <DISubprogram>`. The type of the member
+is the ``baseType:``. The ``offset:`` is the member's bit offset.
 ``DW_TAG_formal_parameter`` is used to define a member which is a formal
 argument of a subprogram.
 
@@ -3707,10 +3894,10 @@ DICompositeType
 """""""""""""""
 
 ``DICompositeType`` nodes represent types composed of other types, like
-structures and unions.  ``elements:`` points to a tuple of the composed types.
+structures and unions. ``elements:`` points to a tuple of the composed types.
 
 If the source language supports ODR, the ``identifier:`` field gives the unique
-identifier used for type merging between modules.  When specified, other types
+identifier used for type merging between modules. When specified, other types
 can refer to composite types indirectly via a :ref:`metadata string
 <metadata-string>` that matches their identifier.
 
@@ -3738,12 +3925,12 @@ The following ``tag:`` values are valid:
 
 For ``DW_TAG_array_type``, the ``elements:`` should be :ref:`subrange
 descriptors <DISubrange>`, each representing the range of subscripts at that
-level of indexing.  The ``DIFlagVector`` flag to ``flags:`` indicates that an
+level of indexing. The ``DIFlagVector`` flag to ``flags:`` indicates that an
 array type is a native packed vector.
 
 For ``DW_TAG_enumeration_type``, the ``elements:`` should be :ref:`enumerator
 descriptors <DIEnumerator>`, each representing the definition of an enumeration
-value for the set.  All enumeration type descriptors are collected in the
+value for the set. All enumeration type descriptors are collected in the
 ``enums:`` field of the :ref:`compile unit <DICompileUnit>`.
 
 For ``DW_TAG_structure_type``, ``DW_TAG_class_type``, and
@@ -3756,7 +3943,7 @@ DISubrange
 """"""""""
 
 ``DISubrange`` nodes are the elements for ``DW_TAG_array_type`` variants of
-:ref:`DICompositeType`.  ``count: -1`` indicates an empty array.
+:ref:`DICompositeType`. ``count: -1`` indicates an empty array.
 
 .. code-block:: llvm
 
@@ -3782,7 +3969,7 @@ DITemplateTypeParameter
 """""""""""""""""""""""
 
 ``DITemplateTypeParameter`` nodes represent type parameters to generic source
-language constructs.  They are used (optionally) in :ref:`DICompositeType` and
+language constructs. They are used (optionally) in :ref:`DICompositeType` and
 :ref:`DISubprogram` ``templateParams:`` fields.
 
 .. code-block:: llvm
@@ -3793,9 +3980,9 @@ DITemplateValueParameter
 """"""""""""""""""""""""
 
 ``DITemplateValueParameter`` nodes represent value parameters to generic source
-language constructs.  ``tag:`` defaults to ``DW_TAG_template_value_parameter``,
+language constructs. ``tag:`` defaults to ``DW_TAG_template_value_parameter``,
 but if specified can also be set to ``DW_TAG_GNU_template_template_param`` or
-``DW_TAG_GNU_template_param_pack``.  They are used (optionally) in
+``DW_TAG_GNU_template_param_pack``. They are used (optionally) in
 :ref:`DICompositeType` and :ref:`DISubprogram` ``templateParams:`` fields.
 
 .. code-block:: llvm
@@ -3831,20 +4018,26 @@ All global variables should be referenced by the `globals:` field of a
 DISubprogram
 """"""""""""
 
-``DISubprogram`` nodes represent functions from the source language.  The
-``variables:`` field points at :ref:`variables <DILocalVariable>` that must be
-retained, even if their IR counterparts are optimized out of the IR.  The
-``type:`` field must point at an :ref:`DISubroutineType`.
+``DISubprogram`` nodes represent functions from the source language. A
+``DISubprogram`` may be attached to a function definition using ``!dbg``
+metadata. The ``variables:`` field points at :ref:`variables <DILocalVariable>`
+that must be retained, even if their IR counterparts are optimized out of
+the IR. The ``type:`` field must point at an :ref:`DISubroutineType`.
 
 .. code-block:: llvm
 
-    !0 = !DISubprogram(name: "foo", linkageName: "_Zfoov", scope: !1,
-                       file: !2, line: 7, type: !3, isLocal: true,
-                       isDefinition: false, scopeLine: 8, containingType: !4,
-                       virtuality: DW_VIRTUALITY_pure_virtual, virtualIndex: 10,
-                       flags: DIFlagPrototyped, isOptimized: true,
-                       function: void ()* @_Z3foov,
-                       templateParams: !5, declaration: !6, variables: !7)
+    define void @_Z3foov() !dbg !0 {
+      ...
+    }
+
+    !0 = distinct !DISubprogram(name: "foo", linkageName: "_Zfoov", scope: !1,
+                                file: !2, line: 7, type: !3, isLocal: true,
+                                isDefinition: false, scopeLine: 8,
+                                containingType: !4,
+                                virtuality: DW_VIRTUALITY_pure_virtual,
+                                virtualIndex: 10, flags: DIFlagPrototyped,
+                                isOptimized: true, templateParams: !5,
+                                declaration: !6, variables: !7)
 
 .. _DILexicalBlock:
 
@@ -3852,8 +4045,8 @@ DILexicalBlock
 """"""""""""""
 
 ``DILexicalBlock`` nodes describe nested blocks within a :ref:`subprogram
-<DISubprogram>`.  The line number and column numbers are used to dinstinguish
-two lexical blocks at same depth.  They are valid targets for ``scope:``
+<DISubprogram>`. The line number and column numbers are used to distinguish
+two lexical blocks at same depth. They are valid targets for ``scope:``
 fields.
 
 .. code-block:: llvm
@@ -3869,7 +4062,7 @@ DILexicalBlockFile
 """"""""""""""""""
 
 ``DILexicalBlockFile`` nodes are used to discriminate between sections of a
-:ref:`lexical block <DILexicalBlock>`.  The ``file:`` field can be changed to
+:ref:`lexical block <DILexicalBlock>`. The ``file:`` field can be changed to
 indicate textual inclusion, or the ``discriminator:`` field can be used to
 discriminate between control flow within a single block in the source language.
 
@@ -3884,7 +4077,7 @@ discriminate between control flow within a single block in the source language.
 DILocation
 """"""""""
 
-``DILocation`` nodes represent source debug locations.  The ``scope:`` field is
+``DILocation`` nodes represent source debug locations. The ``scope:`` field is
 mandatory, and points at an :ref:`DILexicalBlockFile`, an
 :ref:`DILexicalBlock`, or an :ref:`DISubprogram`.
 
@@ -3897,27 +4090,23 @@ mandatory, and points at an :ref:`DILexicalBlockFile`, an
 DILocalVariable
 """""""""""""""
 
-``DILocalVariable`` nodes represent local variables in the source language.
-Instead of ``DW_TAG_variable``, they use LLVM-specific fake tags to
-discriminate between local variables (``DW_TAG_auto_variable``) and subprogram
-arguments (``DW_TAG_arg_variable``).  In the latter case, the ``arg:`` field
-specifies the argument position, and this variable will be included in the
-``variables:`` field of its :ref:`DISubprogram`.
+``DILocalVariable`` nodes represent local variables in the source language. If
+the ``arg:`` field is set to non-zero, then this variable is a subprogram
+parameter, and it will be included in the ``variables:`` field of its
+:ref:`DISubprogram`.
 
 .. code-block:: llvm
 
-    !0 = !DILocalVariable(tag: DW_TAG_arg_variable, name: "this", arg: 0,
-                          scope: !3, file: !2, line: 7, type: !3,
-                          flags: DIFlagArtificial)
-    !1 = !DILocalVariable(tag: DW_TAG_arg_variable, name: "x", arg: 1,
-                          scope: !4, file: !2, line: 7, type: !3)
-    !1 = !DILocalVariable(tag: DW_TAG_auto_variable, name: "y",
-                          scope: !5, file: !2, line: 7, type: !3)
+    !0 = !DILocalVariable(name: "this", arg: 1, scope: !3, file: !2, line: 7,
+                          type: !3, flags: DIFlagArtificial)
+    !1 = !DILocalVariable(name: "x", arg: 2, scope: !4, file: !2, line: 7,
+                          type: !3)
+    !2 = !DILocalVariable(name: "y", scope: !5, file: !2, line: 7, type: !3)
 
 DIExpression
 """"""""""""
 
-``DIExpression`` nodes represent DWARF expression sequences.  They are used in
+``DIExpression`` nodes represent DWARF expression sequences. They are used in
 :ref:`debug intrinsics<dbg_intrinsics>` (such as ``llvm.dbg.declare``) to
 describe how the referenced LLVM variable relates to the source language
 variable.
@@ -3957,6 +4146,32 @@ compile unit.
    !2 = !DIImportedEntity(tag: DW_TAG_imported_module, name: "foo", scope: !0,
                           entity: !1, line: 7)
 
+DIMacro
+"""""""
+
+``DIMacro`` nodes represent definition or undefinition of a macro identifiers.
+The ``name:`` field is the macro identifier, followed by macro parameters when
+definining a function-like macro, and the ``value`` field is the token-string
+used to expand the macro identifier.
+
+.. code-block:: llvm
+
+   !2 = !DIMacro(macinfo: DW_MACINFO_define, line: 7, name: "foo(x)",
+                 value: "((x) + 1)")
+   !3 = !DIMacro(macinfo: DW_MACINFO_undef, line: 30, name: "foo")
+
+DIMacroFile
+"""""""""""
+
+``DIMacroFile`` nodes represent inclusion of source files.
+The ``nodes:`` field is a list of ``DIMacro`` and ``DIMacroFile`` nodes that
+appear in the included source file.
+
+.. code-block:: llvm
+
+   !2 = !DIMacroFile(macinfo: DW_MACINFO_start_file, line: 7, file: !2,
+                     nodes: !3)
+
 '``tbaa``' Metadata
 ^^^^^^^^^^^^^^^^^^^
 
@@ -4041,13 +4256,13 @@ alias.
 
 The metadata identifying each domain is itself a list containing one or two
 entries. The first entry is the name of the domain. Note that if the name is a
-string then it can be combined accross functions and translation units. A
+string then it can be combined across functions and translation units. A
 self-reference can be used to create globally unique domain names. A
 descriptive string may optionally be provided as a second list entry.
 
 The metadata identifying each scope is also itself a list containing two or
 three entries. The first entry is the name of the scope. Note that if the name
-is a string then it can be combined accross functions and translation units. A
+is a string then it can be combined across functions and translation units. A
 self-reference can be used to create globally unique scope names. A metadata
 reference to the scope's domain is the second entry. A descriptive string may
 optionally be provided as a third list entry.
@@ -4144,6 +4359,16 @@ Examples:
     !2 = !{ i8 0, i8 2, i8 3, i8 6 }
     !3 = !{ i8 -2, i8 0, i8 3, i8 6 }
 
+'``unpredictable``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``unpredictable`` metadata may be attached to any branch or switch
+instruction. It can be used to express the unpredictability of control
+flow. Similar to the llvm.expect intrinsic, it may be used to alter
+optimizations related to compare and branch instructions. The metadata
+is treated as a boolean value; if it exists, it signals that the branch
+or switch that it is attached to is completely unpredictable.
+
 '``llvm.loop``'
 ^^^^^^^^^^^^^^^
 
@@ -4182,11 +4407,11 @@ suggests an unroll factor to the loop unroller:
 
 Metadata prefixed with ``llvm.loop.vectorize`` or ``llvm.loop.interleave`` are
 used to control per-loop vectorization and interleaving parameters such as
-vectorization width and interleave count.  These metadata should be used in
-conjunction with ``llvm.loop`` loop identification metadata.  The
+vectorization width and interleave count. These metadata should be used in
+conjunction with ``llvm.loop`` loop identification metadata. The
 ``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only
 optimization hints and the optimizer will only interleave and vectorize loops if
-it believes it is safe to do so.  The ``llvm.mem.parallel_loop_access`` metadata
+it believes it is safe to do so. The ``llvm.mem.parallel_loop_access`` metadata
 which contains information about loop-carried memory dependencies can be helpful
 in determining the safety of these transformations.
 
@@ -4203,7 +4428,7 @@ example:
    !0 = !{!"llvm.loop.interleave.count", i32 4}
 
 Note that setting ``llvm.loop.interleave.count`` to 1 disables interleaving
-multiple iterations of the loop.  If ``llvm.loop.interleave.count`` is set to 0
+multiple iterations of the loop. If ``llvm.loop.interleave.count`` is set to 0
 then the interleave count will be determined automatically.
 
 '``llvm.loop.vectorize.enable``' Metadata
@@ -4211,7 +4436,7 @@ then the interleave count will be determined automatically.
 
 This metadata selectively enables or disables vectorization for the loop. The
 first operand is the string ``llvm.loop.vectorize.enable`` and the second operand
-is a bit.  If the bit operand value is 1 vectorization is enabled. A value of
+is a bit. If the bit operand value is 1 vectorization is enabled. A value of
 0 disables vectorization:
 
 .. code-block:: llvm
@@ -4231,7 +4456,7 @@ operand is an integer specifying the width. For example:
    !0 = !{!"llvm.loop.vectorize.width", i32 4}
 
 Note that setting ``llvm.loop.vectorize.width`` to 1 disables
-vectorization of the loop.  If ``llvm.loop.vectorize.width`` is set to
+vectorization of the loop. If ``llvm.loop.vectorize.width`` is set to
 0 or if the loop does not have this metadata the width will be
 determined automatically.
 
@@ -4264,7 +4489,7 @@ will be partially unrolled.
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 This metadata disables loop unrolling. The metadata has a single operand
-which is the string ``llvm.loop.unroll.disable``.  For example:
+which is the string ``llvm.loop.unroll.disable``. For example:
 
 .. code-block:: llvm
 
@@ -4274,12 +4499,24 @@ which is the string ``llvm.loop.unroll.disable``.  For example:
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 This metadata disables runtime loop unrolling. The metadata has a single
-operand which is the string ``llvm.loop.unroll.runtime.disable``.  For example:
+operand which is the string ``llvm.loop.unroll.runtime.disable``. For example:
 
 .. code-block:: llvm
 
    !0 = !{!"llvm.loop.unroll.runtime.disable"}
 
+'``llvm.loop.unroll.enable``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata suggests that the loop should be fully unrolled if the trip count
+is known at compile time and partially unrolled if the trip count is not known
+at compile time. The metadata has a single operand which is the string
+``llvm.loop.unroll.enable``.  For example:
+
+.. code-block:: llvm
+
+   !0 = !{!"llvm.loop.unroll.enable"}
+
 '``llvm.loop.unroll.full``' Metadata
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -4319,7 +4556,7 @@ loop.
 
 Note that if not all memory access instructions have such metadata referring to
 the loop, then the loop is considered not being trivially parallel. Additional
-memory dependence analysis is required to make that determination.  As a fail
+memory dependence analysis is required to make that determination. As a fail
 safe mechanism, this causes loops that were originally parallel to be considered
 sequential (if optimization passes that are unaware of the parallel semantics
 insert new memory instructions into the loop body).
@@ -4380,6 +4617,50 @@ the loop identifier metadata node directly:
 The ``llvm.bitsets`` global metadata is used to implement
 :doc:`bitsets <BitSets>`.
 
+'``invariant.group``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``invariant.group`` metadata may be attached to ``load``/``store`` instructions.
+The existence of the ``invariant.group`` metadata on the instruction tells 
+the optimizer that every ``load`` and ``store`` to the same pointer operand 
+within the same invariant group can be assumed to load or store the same  
+value (but see the ``llvm.invariant.group.barrier`` intrinsic which affects 
+when two pointers are considered the same).
+
+Examples:
+
+.. code-block:: llvm
+
+   @unknownPtr = external global i8
+   ...
+   %ptr = alloca i8
+   store i8 42, i8* %ptr, !invariant.group !0
+   call void @foo(i8* %ptr)
+   
+   %a = load i8, i8* %ptr, !invariant.group !0 ; Can assume that value under %ptr didn't change
+   call void @foo(i8* %ptr)
+   %b = load i8, i8* %ptr, !invariant.group !1 ; Can't assume anything, because group changed
+  
+   %newPtr = call i8* @getPointer(i8* %ptr) 
+   %c = load i8, i8* %newPtr, !invariant.group !0 ; Can't assume anything, because we only have information about %ptr
+   
+   %unknownValue = load i8, i8* @unknownPtr
+   store i8 %unknownValue, i8* %ptr, !invariant.group !0 ; Can assume that %unknownValue == 42
+   
+   call void @foo(i8* %ptr)
+   %newPtr2 = call i8* @llvm.invariant.group.barrier(i8* %ptr)
+   %d = load i8, i8* %newPtr2, !invariant.group !0  ; Can't step through invariant.group.barrier to get value of %ptr
+   
+   ...
+   declare void @foo(i8*)
+   declare i8* @getPointer(i8*)
+   declare i8* @llvm.invariant.group.barrier(i8*)
+   
+   !0 = !{!"magic ptr"}
+   !1 = !{!"other ptr"}
+
+
+
 Module Flags Metadata
 =====================
 
@@ -4738,7 +5019,10 @@ control flow, not values (the one exception being the
 The terminator instructions are: ':ref:`ret <i_ret>`',
 ':ref:`br <i_br>`', ':ref:`switch <i_switch>`',
 ':ref:`indirectbr <i_indirectbr>`', ':ref:`invoke <i_invoke>`',
-':ref:`resume <i_resume>`', and ':ref:`unreachable <i_unreachable>`'.
+':ref:`resume <i_resume>`', ':ref:`catchswitch <i_catchswitch>`',
+':ref:`catchret <i_catchret>`',
+':ref:`cleanupret <i_cleanupret>`',
+and ':ref:`unreachable <i_unreachable>`'.
 
 .. _i_ret:
 
@@ -4970,7 +5254,7 @@ Syntax:
 ::
 
       <result> = invoke [cconv] [ret attrs] <ptr to function ty> <function ptr val>(<function args>) [fn attrs]
-                    to label <normal label> unwind label <exception label>
+                    [operand bundles] to label <normal label> unwind label <exception label>
 
 Overview:
 """""""""
@@ -5024,6 +5308,7 @@ This instruction requires several arguments:
 #. The optional :ref:`function attributes <fnattrs>` list. Only
    '``noreturn``', '``nounwind``', '``readonly``' and '``readnone``'
    attributes are valid here.
+#. The optional :ref:`operand bundles <opbundles>` list.
 
 Semantics:
 """"""""""
@@ -5092,6 +5377,235 @@ Example:
 
       resume { i8*, i32 } %exn
 
+.. _i_catchswitch:
+
+'``catchswitch``' Instruction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      <resultval> = catchswitch within <parent> [ label <handler1>, label <handler2>, ... ] unwind to caller
+      <resultval> = catchswitch within <parent> [ label <handler1>, label <handler2>, ... ] unwind label <default>
+
+Overview:
+"""""""""
+
+The '``catchswitch``' instruction is used by `LLVM's exception handling system
+<ExceptionHandling.html#overview>`_ to describe the set of possible catch handlers
+that may be executed by the :ref:`EH personality routine <personalityfn>`.
+
+Arguments:
+""""""""""
+
+The ``parent`` argument is the token of the funclet that contains the
+``catchswitch`` instruction. If the ``catchswitch`` is not inside a funclet,
+this operand may be the token ``none``.
+
+The ``default`` argument is the label of another basic block beginning with a
+"pad" instruction, one of ``cleanuppad`` or ``catchswitch``.
+
+The ``handlers`` are a list of successor blocks that each begin with a
+:ref:`catchpad <i_catchpad>` instruction.
+
+Semantics:
+""""""""""
+
+Executing this instruction transfers control to one of the successors in
+``handlers``, if appropriate, or continues to unwind via the unwind label if
+present.
+
+The ``catchswitch`` is both a terminator and a "pad" instruction, meaning that
+it must be both the first non-phi instruction and last instruction in the basic
+block. Therefore, it must be the only non-phi instruction in the block.
+
+Example:
+""""""""
+
+.. code-block:: llvm
+
+    dispatch1:
+      %cs1 = catchswitch within none [label %handler0, label %handler1] unwind to caller
+    dispatch2:
+      %cs2 = catchswitch within %parenthandler [label %handler0] unwind label %cleanup
+
+.. _i_catchpad:
+
+'``catchpad``' Instruction
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      <resultval> = catchpad within <catchswitch> [<args>*]
+
+Overview:
+"""""""""
+
+The '``catchpad``' instruction is used by `LLVM's exception handling
+system <ExceptionHandling.html#overview>`_ to specify that a basic block
+begins a catch handler --- one where a personality routine attempts to transfer
+control to catch an exception.
+
+Arguments:
+""""""""""
+
+The ``catchswitch`` operand must always be a token produced by a
+:ref:`catchswitch <i_catchswitch>` instruction in a predecessor block. This
+ensures that each ``catchpad`` has exactly one predecessor block, and it always
+terminates in a ``catchswitch``.
+
+The ``args`` correspond to whatever information the personality routine
+requires to know if this is an appropriate handler for the exception. Control
+will transfer to the ``catchpad`` if this is the first appropriate handler for
+the exception.
+
+The ``resultval`` has the type :ref:`token <t_token>` and is used to match the
+``catchpad`` to corresponding :ref:`catchrets <i_catchret>` and other nested EH
+pads.
+
+Semantics:
+""""""""""
+
+When the call stack is being unwound due to an exception being thrown, the
+exception is compared against the ``args``. If it doesn't match, control will
+not reach the ``catchpad`` instruction.  The representation of ``args`` is
+entirely target and personality function-specific.
+
+Like the :ref:`landingpad <i_landingpad>` instruction, the ``catchpad``
+instruction must be the first non-phi of its parent basic block.
+
+The meaning of the tokens produced and consumed by ``catchpad`` and other "pad"
+instructions is described in the
+`Windows exception handling documentation <ExceptionHandling.html#wineh>`.
+
+Executing a ``catchpad`` instruction constitutes "entering" that pad.
+The pad may then be "exited" in one of three ways:
+
+1)  explicitly via a ``catchret`` that consumes it.  Executing such a ``catchret``
+    is undefined behavior if any descendant pads have been entered but not yet
+    exited.
+2)  implicitly via a call (which unwinds all the way to the current function's caller),
+    or via a ``catchswitch`` or a ``cleanupret`` that unwinds to caller.
+3)  implicitly via an unwind edge whose destination EH pad isn't a descendant of
+    the ``catchpad``.  When the ``catchpad`` is exited in this manner, it is
+    undefined behavior if the destination EH pad has a parent which is not an
+    ancestor of the ``catchpad`` being exited.
+
+Example:
+""""""""
+
+.. code-block:: llvm
+
+    dispatch:
+      %cs = catchswitch within none [label %handler0] unwind to caller
+      ;; A catch block which can catch an integer.
+    handler0:
+      %tok = catchpad within %cs [i8** @_ZTIi]
+
+.. _i_catchret:
+
+'``catchret``' Instruction
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      catchret from <token> to label <normal>
+
+Overview:
+"""""""""
+
+The '``catchret``' instruction is a terminator instruction that has a
+single successor.
+
+
+Arguments:
+""""""""""
+
+The first argument to a '``catchret``' indicates which ``catchpad`` it
+exits.  It must be a :ref:`catchpad <i_catchpad>`.
+The second argument to a '``catchret``' specifies where control will
+transfer to next.
+
+Semantics:
+""""""""""
+
+The '``catchret``' instruction ends an existing (in-flight) exception whose
+unwinding was interrupted with a :ref:`catchpad <i_catchpad>` instruction.  The
+:ref:`personality function <personalityfn>` gets a chance to execute arbitrary
+code to, for example, destroy the active exception.  Control then transfers to
+``normal``.
+
+The ``token`` argument must be a token produced by a dominating ``catchpad``
+instruction. The ``catchret`` destroys the physical frame established by
+``catchpad``, so executing multiple returns on the same token without
+re-executing the ``catchpad`` will result in undefined behavior.
+See :ref:`catchpad <i_catchpad>` for more details.
+
+Example:
+""""""""
+
+.. code-block:: llvm
+
+      catchret from %catch label %continue
+
+.. _i_cleanupret:
+
+'``cleanupret``' Instruction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      cleanupret from <value> unwind label <continue>
+      cleanupret from <value> unwind to caller
+
+Overview:
+"""""""""
+
+The '``cleanupret``' instruction is a terminator instruction that has
+an optional successor.
+
+
+Arguments:
+""""""""""
+
+The '``cleanupret``' instruction requires one argument, which indicates
+which ``cleanuppad`` it exits, and must be a :ref:`cleanuppad <i_cleanuppad>`.
+It also has an optional successor, ``continue``.
+
+Semantics:
+""""""""""
+
+The '``cleanupret``' instruction indicates to the
+:ref:`personality function <personalityfn>` that one
+:ref:`cleanuppad <i_cleanuppad>` it transferred control to has ended.
+It transfers control to ``continue`` or unwinds out of the function.
+
+The unwind destination ``continue``, if present, must be an EH pad
+whose parent is either ``none`` or an ancestor of the ``cleanuppad``
+being returned from.  This constitutes an exceptional exit from all
+ancestors of the completed ``cleanuppad``, up to but not including
+the parent of ``continue``.
+See :ref:`cleanuppad <i_cleanuppad>` for more details.
+
+Example:
+""""""""
+
+.. code-block:: llvm
+
+      cleanupret from %cleanup unwind to caller
+      cleanupret from %cleanup unwind label %continue
+
 .. _i_unreachable:
 
 '``unreachable``' Instruction
@@ -6165,7 +6679,7 @@ Arguments:
 """"""""""
 
 The first operand of an '``extractvalue``' instruction is a value of
-:ref:`struct <t_struct>` or :ref:`array <t_array>` type. The operands are
+:ref:`struct <t_struct>` or :ref:`array <t_array>` type. The other operands are
 constant indices to specify which value to extract in a similar manner
 as indices in a '``getelementptr``' instruction.
 
@@ -6312,9 +6826,11 @@ Syntax:
 
 ::
 
-      <result> = load [volatile] <ty>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>][, !nonnull !<index>][, !dereferenceable !<index>][, !dereferenceable_or_null !<index>]
-      <result> = load atomic [volatile] <ty>* <pointer> [singlethread] <ordering>, align <alignment>
+      <result> = load [volatile] <ty>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>][, !invariant.group !<index>][, !nonnull !<index>][, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>][, !align !<align_node>]
+      <result> = load atomic [volatile] <ty>* <pointer> [singlethread] <ordering>, align <alignment> [, !invariant.group !<index>]
       !<index> = !{ i32 1 }
+      !<deref_bytes_node> = !{i64 <dereferenceable_bytes>}
+      !<align_node> = !{ i64 <value_alignment> }
 
 Overview:
 """""""""
@@ -6331,17 +6847,16 @@ then the optimizer is not allowed to modify the number or order of
 execution of this ``load`` with other :ref:`volatile
 operations <volatile>`.
 
-If the ``load`` is marked as ``atomic``, it takes an extra
-:ref:`ordering <ordering>` and optional ``singlethread`` argument. The
-``release`` and ``acq_rel`` orderings are not valid on ``load``
-instructions. Atomic loads produce :ref:`defined <memmodel>` results
-when they may see multiple atomic stores. The type of the pointee must
-be an integer type whose bit width is a power of two greater than or
-equal to eight and less than or equal to a target-specific size limit.
-``align`` must be explicitly specified on atomic loads, and the load has
-undefined behavior if the alignment is not set to a value which is at
-least the size in bytes of the pointee. ``!nontemporal`` does not have
-any defined semantics for atomic loads.
+If the ``load`` is marked as ``atomic``, it takes an extra :ref:`ordering
+<ordering>` and optional ``singlethread`` argument. The ``release`` and
+``acq_rel`` orderings are not valid on ``load`` instructions. Atomic loads
+produce :ref:`defined <memmodel>` results when they may see multiple atomic
+stores. The type of the pointee must be an integer, pointer, or floating-point
+type whose bit width is a power of two greater than or equal to eight and less
+than or equal to a target-specific size limit.  ``align`` must be explicitly
+specified on atomic loads, and the load has undefined behavior if the alignment
+is not set to a value which is at least the size in bytes of the
+pointee. ``!nontemporal`` does not have any defined semantics for atomic loads.
 
 The optional constant ``align`` argument specifies the alignment of the
 operation (that is, the alignment of the memory address). A value of 0
@@ -6369,33 +6884,44 @@ Being invariant does not imply that a location is dereferenceable,
 but it does imply that once the location is known dereferenceable
 its value is henceforth unchanging.
 
+The optional ``!invariant.group`` metadata must reference a single metadata name
+ ``<index>`` corresponding to a metadata node. See ``invariant.group`` metadata.
+
 The optional ``!nonnull`` metadata must reference a single
 metadata name ``<index>`` corresponding to a metadata node with no
 entries. The existence of the ``!nonnull`` metadata on the
 instruction tells the optimizer that the value loaded is known to
-never be null.  This is analogous to the ''nonnull'' attribute
-on parameters and return values.  This metadata can only be applied
+never be null. This is analogous to the ``nonnull`` attribute
+on parameters and return values. This metadata can only be applied
 to loads of a pointer type.
 
-The optional ``!dereferenceable`` metadata must reference a single
-metadata name ``<index>`` corresponding to a metadata node with one ``i64``
-entry. The existence of the ``!dereferenceable`` metadata on the instruction 
+The optional ``!dereferenceable`` metadata must reference a single metadata
+name ``<deref_bytes_node>`` corresponding to a metadata node with one ``i64``
+entry. The existence of the ``!dereferenceable`` metadata on the instruction
 tells the optimizer that the value loaded is known to be dereferenceable.
-The number of bytes known to be dereferenceable is specified by the integer 
-value in the metadata node. This is analogous to the ''dereferenceable'' 
-attribute on parameters and return values.  This metadata can only be applied 
+The number of bytes known to be dereferenceable is specified by the integer
+value in the metadata node. This is analogous to the ''dereferenceable''
+attribute on parameters and return values. This metadata can only be applied
 to loads of a pointer type.
 
 The optional ``!dereferenceable_or_null`` metadata must reference a single
-metadata name ``<index>`` corresponding to a metadata node with one ``i64``
-entry. The existence of the ``!dereferenceable_or_null`` metadata on the 
+metadata name ``<deref_bytes_node>`` corresponding to a metadata node with one
+``i64`` entry. The existence of the ``!dereferenceable_or_null`` metadata on the
 instruction tells the optimizer that the value loaded is known to be either
 dereferenceable or null.
-The number of bytes known to be dereferenceable is specified by the integer 
-value in the metadata node. This is analogous to the ''dereferenceable_or_null'' 
-attribute on parameters and return values.  This metadata can only be applied 
+The number of bytes known to be dereferenceable is specified by the integer
+value in the metadata node. This is analogous to the ''dereferenceable_or_null''
+attribute on parameters and return values. This metadata can only be applied
 to loads of a pointer type.
 
+The optional ``!align`` metadata must reference a single metadata name
+``<align_node>`` corresponding to a metadata node with one ``i64`` entry.
+The existence of the ``!align`` metadata on the instruction tells the
+optimizer that the value loaded is known to be aligned to a boundary specified
+by the integer value in the metadata node. The alignment must be a power of 2.
+This is analogous to the ''align'' attribute on parameters and return values.
+This metadata can only be applied to loads of a pointer type.
+
 Semantics:
 """"""""""
 
@@ -6426,8 +6952,8 @@ Syntax:
 
 ::
 
-      store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>]        ; yields void
-      store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread] <ordering>, align <alignment>  ; yields void
+      store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.group !<index>]        ; yields void
+      store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread] <ordering>, align <alignment> [, !invariant.group !<index>] ; yields void
 
 Overview:
 """""""""
@@ -6445,17 +6971,16 @@ then the optimizer is not allowed to modify the number or order of
 execution of this ``store`` with other :ref:`volatile
 operations <volatile>`.
 
-If the ``store`` is marked as ``atomic``, it takes an extra
-:ref:`ordering <ordering>` and optional ``singlethread`` argument. The
-``acquire`` and ``acq_rel`` orderings aren't valid on ``store``
-instructions. Atomic loads produce :ref:`defined <memmodel>` results
-when they may see multiple atomic stores. The type of the pointee must
-be an integer type whose bit width is a power of two greater than or
-equal to eight and less than or equal to a target-specific size limit.
-``align`` must be explicitly specified on atomic stores, and the store
-has undefined behavior if the alignment is not set to a value which is
-at least the size in bytes of the pointee. ``!nontemporal`` does not
-have any defined semantics for atomic stores.
+If the ``store`` is marked as ``atomic``, it takes an extra :ref:`ordering
+<ordering>` and optional ``singlethread`` argument. The ``acquire`` and
+``acq_rel`` orderings aren't valid on ``store`` instructions. Atomic loads
+produce :ref:`defined <memmodel>` results when they may see multiple atomic
+stores. The type of the pointee must be an integer, pointer, or floating-point
+type whose bit width is a power of two greater than or equal to eight and less
+than or equal to a target-specific size limit.  ``align`` must be explicitly
+specified on atomic stores, and the store has undefined behavior if the
+alignment is not set to a value which is at least the size in bytes of the
+pointee. ``!nontemporal`` does not have any defined semantics for atomic stores.
 
 The optional constant ``align`` argument specifies the alignment of the
 operation (that is, the alignment of the memory address). A value of 0
@@ -6474,6 +6999,9 @@ be reused in the cache. The code generator may select special
 instructions to save cache bandwidth, such as the MOVNT instruction on
 x86.
 
+The optional ``!invariant.group`` metadata must reference a 
+single metadata name ``<index>``. See ``invariant.group`` metadata.
+
 Semantics:
 """"""""""
 
@@ -6869,15 +7397,15 @@ will be effectively broadcast into a vector during address calculation.
      ; All arguments are vectors:
      ;   A[i] = ptrs[i] + offsets[i]*sizeof(i8)
      %A = getelementptr i8, <4 x i8*> %ptrs, <4 x i64> %offsets
-     
+
      ; Add the same scalar offset to each pointer of a vector:
      ;   A[i] = ptrs[i] + offset*sizeof(i8)
      %A = getelementptr i8, <4 x i8*> %ptrs, i64 %offset
-     
+
      ; Add distinct offsets to the same pointer:
      ;   A[i] = ptr + offsets[i]*sizeof(i8)
      %A = getelementptr i8, i8* %ptr, <4 x i64> %offsets
-     
+
      ; In all cases described above the type of the result is <4 x i8*>
 
 The two following instructions are equivalent:
@@ -6889,7 +7417,7 @@ The two following instructions are equivalent:
        <4 x i32> <i32 1, i32 1, i32 1, i32 1>,
        <4 x i32> %ind4,
        <4 x i64> <i64 13, i64 13, i64 13, i64 13>
-     
+
      getelementptr  %struct.ST, <4 x %struct.ST*> %s, <4 x i64> %ind1,
        i32 2, i32 1, <4 x i32> %ind4, i64 13
 
@@ -7068,10 +7596,12 @@ implies that ``fptrunc`` cannot be used to make a *no-op cast*.
 Semantics:
 """"""""""
 
-The '``fptrunc``' instruction truncates a ``value`` from a larger
+The '``fptrunc``' instruction casts a ``value`` from a larger
 :ref:`floating point <t_floating>` type to a smaller :ref:`floating
-point <t_floating>` type. If the value cannot fit within the
-destination type, ``ty2``, then the results are undefined.
+point <t_floating>` type. If the value cannot fit (i.e. overflows) within the
+destination type, ``ty2``, then the results are undefined. If the cast produces
+an inexact result, how rounding is performed (e.g. truncation, also known as
+round to zero) is undefined.
 
 Example:
 """"""""
@@ -7403,7 +7933,7 @@ The '``bitcast``' instruction takes a value to cast, which must be a
 non-aggregate first class value, and a type to cast it to, which must
 also be a non-aggregate :ref:`first class <t_firstclass>` type. The
 bit sizes of ``value`` and the destination type, ``ty2``, must be
-identical.  If the source type is a pointer, the destination type must
+identical. If the source type is a pointer, the destination type must
 also be a pointer of the same size. This instruction supports bitwise
 conversion of vectors to integers and to vectors of other types (as
 long as they have the same size).
@@ -7800,7 +8330,8 @@ Syntax:
 
 ::
 
-      <result> = [tail | musttail] call [cconv] [ret attrs] <ty> [<fnty>*] <fnptrval>(<function args>) [fn attrs]
+      <result> = [tail | musttail | notail ] call [fast-math flags] [cconv] [ret attrs] <ty> [<fnty>*] <fnptrval>(<function args>) [fn attrs]
+                   [ operand bundles ]
 
 Overview:
 """""""""
@@ -7813,10 +8344,10 @@ Arguments:
 This instruction requires several arguments:
 
 #. The optional ``tail`` and ``musttail`` markers indicate that the optimizers
-   should perform tail call optimization.  The ``tail`` marker is a hint that
-   `can be ignored <CodeGenerator.html#sibcallopt>`_.  The ``musttail`` marker
+   should perform tail call optimization. The ``tail`` marker is a hint that
+   `can be ignored <CodeGenerator.html#sibcallopt>`_. The ``musttail`` marker
    means that the call must be tail call optimized in order for the program to
-   be correct.  The ``musttail`` marker provides these guarantees:
+   be correct. The ``musttail`` marker provides these guarantees:
 
    #. The call will not cause unbounded stack growth if it is part of a
       recursive cycle in the call graph.
@@ -7824,14 +8355,14 @@ This instruction requires several arguments:
       forwarded in place.
 
    Both markers imply that the callee does not access allocas or varargs from
-   the caller.  Calls marked ``musttail`` must obey the following additional
+   the caller. Calls marked ``musttail`` must obey the following additional
    rules:
 
    - The call must immediately precede a :ref:`ret <i_ret>` instruction,
      or a pointer bitcast followed by a ret instruction.
    - The ret instruction must return the (possibly bitcasted) value
      produced by the call or void.
-   - The caller and callee prototypes must match.  Pointer types of
+   - The caller and callee prototypes must match. Pointer types of
      parameters or return types may differ in pointee type, but not
      in address space.
    - The calling conventions of the caller and callee must match.
@@ -7852,6 +8383,15 @@ This instruction requires several arguments:
    -  `Platform-specific constraints are
       met. <CodeGenerator.html#tailcallopt>`_
 
+#. The optional ``notail`` marker indicates that the optimizers should not add
+   ``tail`` or ``musttail`` markers to the call. It is used to prevent tail
+   call optimization from being performed on the call.
+
+#. The optional ``fast-math flags`` marker indicates that the call has one or more 
+   :ref:`fast-math flags <fastmath>`, which are optimization hints to enable
+   otherwise unsafe floating-point optimizations. Fast-math flags are only valid
+   for calls that return a floating-point scalar or vector type.
+
 #. The optional "cconv" marker indicates which :ref:`calling
    convention <callingconv>` the call should use. If none is
    specified, the call defaults to using C calling conventions. The
@@ -7880,6 +8420,7 @@ This instruction requires several arguments:
 #. The optional :ref:`function attributes <fnattrs>` list. Only
    '``noreturn``', '``nounwind``', '``readonly``' and '``readnone``'
    attributes are valid here.
+#. The optional :ref:`operand bundles <opbundles>` list.
 
 Semantics:
 """"""""""
@@ -8049,6 +8590,84 @@ Example:
                catch i8** @_ZTIi
                filter [1 x i8**] [@_ZTId]
 
+.. _i_cleanuppad:
+
+'``cleanuppad``' Instruction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      <resultval> = cleanuppad within <parent> [<args>*]
+
+Overview:
+"""""""""
+
+The '``cleanuppad``' instruction is used by `LLVM's exception handling
+system <ExceptionHandling.html#overview>`_ to specify that a basic block
+is a cleanup block --- one where a personality routine attempts to
+transfer control to run cleanup actions.
+The ``args`` correspond to whatever additional
+information the :ref:`personality function <personalityfn>` requires to
+execute the cleanup.
+The ``resultval`` has the type :ref:`token <t_token>` and is used to
+match the ``cleanuppad`` to corresponding :ref:`cleanuprets <i_cleanupret>`.
+The ``parent`` argument is the token of the funclet that contains the
+``cleanuppad`` instruction. If the ``cleanuppad`` is not inside a funclet,
+this operand may be the token ``none``.
+
+Arguments:
+""""""""""
+
+The instruction takes a list of arbitrary values which are interpreted
+by the :ref:`personality function <personalityfn>`.
+
+Semantics:
+""""""""""
+
+When the call stack is being unwound due to an exception being thrown,
+the :ref:`personality function <personalityfn>` transfers control to the
+``cleanuppad`` with the aid of the personality-specific arguments.
+As with calling conventions, how the personality function results are
+represented in LLVM IR is target specific.
+
+The ``cleanuppad`` instruction has several restrictions:
+
+-  A cleanup block is a basic block which is the unwind destination of
+   an exceptional instruction.
+-  A cleanup block must have a '``cleanuppad``' instruction as its
+   first non-PHI instruction.
+-  There can be only one '``cleanuppad``' instruction within the
+   cleanup block.
+-  A basic block that is not a cleanup block may not include a
+   '``cleanuppad``' instruction.
+
+Executing a ``cleanuppad`` instruction constitutes "entering" that pad.
+The pad may then be "exited" in one of three ways:
+
+1)  explicitly via a ``cleanupret`` that consumes it.  Executing such a ``cleanupret``
+    is undefined behavior if any descendant pads have been entered but not yet
+    exited.
+2)  implicitly via a call (which unwinds all the way to the current function's caller),
+    or via a ``catchswitch`` or a ``cleanupret`` that unwinds to caller.
+3)  implicitly via an unwind edge whose destination EH pad isn't a descendant of
+    the ``cleanuppad``.  When the ``cleanuppad`` is exited in this manner, it is
+    undefined behavior if the destination EH pad has a parent which is not an
+    ancestor of the ``cleanuppad`` being exited.
+
+It is undefined behavior for the ``cleanuppad`` to exit via an unwind edge which
+does not transitively unwind to the same destination as a constituent
+``cleanupret``.
+
+Example:
+""""""""
+
+.. code-block:: llvm
+
+      %tok = cleanuppad within %cs []
+
 .. _intrinsics:
 
 Intrinsic Functions
@@ -8265,11 +8884,11 @@ Experimental Statepoint Intrinsics
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 LLVM provides an second experimental set of intrinsics for describing garbage
-collection safepoints in compiled code.  These intrinsics are an alternative
+collection safepoints in compiled code. These intrinsics are an alternative
 to the ``llvm.gcroot`` intrinsics, but are compatible with the ones for
-:ref:`read <int_gcread>` and :ref:`write <int_gcwrite>` barriers.  The
+:ref:`read <int_gcread>` and :ref:`write <int_gcwrite>` barriers. The
 differences in approach are covered in the `Garbage Collection with LLVM
-<GarbageCollection.html>`_ documentation.  The intrinsics themselves are
+<GarbageCollection.html>`_ documentation. The intrinsics themselves are
 described in :doc:`Statepoints`.
 
 .. _int_gcroot:
@@ -8613,6 +9232,48 @@ Semantics:
 
 See the description for :ref:`llvm.stacksave <int_stacksave>`.
 
+.. _int_get_dynamic_area_offset:
+
+'``llvm.get.dynamic.area.offset``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare i32 @llvm.get.dynamic.area.offset.i32()
+      declare i64 @llvm.get.dynamic.area.offset.i64()
+
+      Overview:
+      """""""""
+
+      The '``llvm.get.dynamic.area.offset.*``' intrinsic family is used to
+      get the offset from native stack pointer to the address of the most
+      recent dynamic alloca on the caller's stack. These intrinsics are
+      intendend for use in combination with
+      :ref:`llvm.stacksave <int_stacksave>` to get a
+      pointer to the most recent dynamic alloca. This is useful, for example,
+      for AddressSanitizer's stack unpoisoning routines.
+
+Semantics:
+""""""""""
+
+      These intrinsics return a non-negative integer value that can be used to
+      get the address of the most recent dynamic alloca, allocated by :ref:`alloca <i_alloca>`
+      on the caller's stack. In particular, for targets where stack grows downwards,
+      adding this offset to the native stack pointer would get the address of the most
+      recent dynamic alloca. For targets where stack grows upwards, the situation is a bit more
+      complicated, because substracting this value from stack pointer would get the address
+      one past the end of the most recent dynamic alloca.
+
+      Although for most targets `llvm.get.dynamic.area.offset <int_get_dynamic_area_offset>`
+      returns just a zero, for others, such as PowerPC and PowerPC64, it returns a
+      compile-time-known constant value.
+
+      The return value type of :ref:`llvm.get.dynamic.area.offset <int_get_dynamic_area_offset>`
+      must match the target's generic address space's (address space 0) pointer type.
+
 '``llvm.prefetch``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -8791,6 +9452,55 @@ structures and the code to increment the appropriate value, in a
 format that can be written out by a compiler runtime and consumed via
 the ``llvm-profdata`` tool.
 
+'``llvm.instrprof_value_profile``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare void @llvm.instrprof_value_profile(i8* <name>, i64 <hash>,
+                                                 i64 <value>, i32 <value_kind>,
+                                                 i32 <index>)
+
+Overview:
+"""""""""
+
+The '``llvm.instrprof_value_profile``' intrinsic can be emitted by a
+frontend for use with instrumentation based profiling. This will be
+lowered by the ``-instrprof`` pass to find out the target values,
+instrumented expressions take in a program at runtime.
+
+Arguments:
+""""""""""
+
+The first argument is a pointer to a global variable containing the
+name of the entity being instrumented. ``name`` should generally be the
+(mangled) function name for a set of counters.
+
+The second argument is a hash value that can be used by the consumer
+of the profile data to detect changes to the instrumented source. It
+is an error if ``hash`` differs between two instances of
+``llvm.instrprof_*`` that refer to the same name.
+
+The third argument is the value of the expression being profiled. The profiled
+expression's value should be representable as an unsigned 64-bit value. The
+fourth argument represents the kind of value profiling that is being done. The
+supported value profiling kinds are enumerated through the
+``InstrProfValueKind`` type declared in the
+``<include/llvm/ProfileData/InstrProf.h>`` header file. The last argument is the
+index of the instrumented expression within ``name``. It should be >= 0.
+
+Semantics:
+""""""""""
+
+This intrinsic represents the point where a call to a runtime routine
+should be inserted for value profiling of target expressions. ``-instrprof``
+pass will generate the appropriate data structures and replace the
+``llvm.instrprof_value_profile`` intrinsic with the call to the profile
+runtime library with proper arguments.
+
 Standard C Library Intrinsics
 -----------------------------
 
@@ -9734,6 +10444,34 @@ Bit Manipulation Intrinsics
 LLVM provides intrinsics for a few important bit manipulation
 operations. These allow efficient code generation for some algorithms.
 
+'``llvm.bitreverse.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic function. You can use bitreverse on any
+integer type.
+
+::
+
+      declare i16 @llvm.bitreverse.i16(i16 <id>)
+      declare i32 @llvm.bitreverse.i32(i32 <id>)
+      declare i64 @llvm.bitreverse.i64(i64 <id>)
+
+Overview:
+"""""""""
+
+The '``llvm.bitreverse``' family of intrinsics is used to reverse the
+bitpattern of an integer value; for example ``0b1234567`` becomes
+``0b7654321``.
+
+Semantics:
+""""""""""
+
+The ``llvm.bitreverse.iN`` intrinsic returns an i16 value that has bit
+``M`` in the input moved to bit ``N-M`` in the output.
+
 '``llvm.bswap.*``' Intrinsics
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -10225,23 +10963,23 @@ Overview:
 """""""""
 
 The '``llvm.canonicalize.*``' intrinsic returns the platform specific canonical
-encoding of a floating point number.  This canonicalization is useful for
+encoding of a floating point number. This canonicalization is useful for
 implementing certain numeric primitives such as frexp. The canonical encoding is
 defined by IEEE-754-2008 to be:
 
 ::
 
       2.1.8 canonical encoding: The preferred encoding of a floating-point
-      representation in a format.  Applied to declets, significands of finite
+      representation in a format. Applied to declets, significands of finite
       numbers, infinities, and NaNs, especially in decimal formats.
 
 This operation can also be considered equivalent to the IEEE-754-2008
-conversion of a floating-point value to the same format.  NaNs are handled
+conversion of a floating-point value to the same format. NaNs are handled
 according to section 6.2.
 
 Examples of non-canonical encodings:
 
-- x87 pseudo denormals, pseudo NaNs, pseudo Infinity, Unnormals.  These are
+- x87 pseudo denormals, pseudo NaNs, pseudo Infinity, Unnormals. These are
   converted to a canonical representation per hardware-specific protocol.
 - Many normal decimal floating point numbers have non-canonical alternative
   encodings.
@@ -10254,11 +10992,11 @@ default exception handling must signal an invalid exception, and produce a
 quiet NaN result.
 
 This function should always be implementable as multiplication by 1.0, provided
-that the compiler does not constant fold the operation.  Likewise, division by
-1.0 and ``llvm.minnum(x, x)`` are possible implementations.  Addition with
+that the compiler does not constant fold the operation. Likewise, division by
+1.0 and ``llvm.minnum(x, x)`` are possible implementations. Addition with
 -0.0 is also sufficient provided that the rounding mode is not -Infinity.
 
-``@llvm.canonicalize`` must preserve the equality relation.  That is:
+``@llvm.canonicalize`` must preserve the equality relation. That is:
 
 - ``(@llvm.canonicalize(x) == x)`` is equivalent to ``(x == x)``
 - ``(@llvm.canonicalize(x) == @llvm.canonicalize(y))`` is equivalent to
@@ -10269,15 +11007,15 @@ Additionally, the sign of zero must be conserved:
 
 The payload bits of a NaN must be conserved, with two exceptions.
 First, environments which use only a single canonical representation of NaN
-must perform said canonicalization.  Second, SNaNs must be quieted per the
+must perform said canonicalization. Second, SNaNs must be quieted per the
 usual methods.
 
 The canonicalization operation may be optimized away if:
 
-- The input is known to be canonical.  For example, it was produced by a
+- The input is known to be canonical. For example, it was produced by a
   floating-point operation that is required by the standard to be canonical.
 - The result is consumed only by (or fused with) other floating-point
-  operations.  That is, the bits of the floating point value are not examined.
+  operations. That is, the bits of the floating point value are not examined.
 
 '``llvm.fmuladd.*``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -10566,12 +11304,16 @@ LLVM provides intrinsics for predicated vector load and store operations. The pr
 
 Syntax:
 """""""
-This is an overloaded intrinsic. The loaded data is a vector of any integer or floating point data type.
+This is an overloaded intrinsic. The loaded data is a vector of any integer, floating point or pointer data type.
 
 ::
 
-      declare <16 x float> @llvm.masked.load.v16f32 (<16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
-      declare <2 x double> @llvm.masked.load.v2f64  (<2 x double>* <ptr>, i32 <alignment>, <2 x i1>  <mask>, <2 x double> <passthru>)
+      declare <16 x float>  @llvm.masked.load.v16f32 (<16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
+      declare <2 x double>  @llvm.masked.load.v2f64  (<2 x double>* <ptr>, i32 <alignment>, <2 x i1>  <mask>, <2 x double> <passthru>)
+      ;; The data is a vector of pointers to double
+      declare <8 x double*> @llvm.masked.load.v8p0f64    (<8 x double*>* <ptr>, i32 <alignment>, <8 x i1> <mask>, <8 x double*> <passthru>)
+      ;; The data is a vector of function pointers
+      declare <8 x i32 ()*> @llvm.masked.load.v8p0f_i32f (<8 x i32 ()*>* <ptr>, i32 <alignment>, <8 x i1> <mask>, <8 x i32 ()*> <passthru>)
 
 Overview:
 """""""""
@@ -10607,12 +11349,16 @@ The result of this operation is equivalent to a regular vector load instruction
 
 Syntax:
 """""""
-This is an overloaded intrinsic. The data stored in memory is a vector of any integer or floating point data type.
+This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating point or pointer data type.
 
 ::
 
-       declare void @llvm.masked.store.v8i32 (<8 x i32>  <value>, <8 x i32> * <ptr>, i32 <alignment>,  <8 x i1>  <mask>)
-       declare void @llvm.masked.store.v16f32(<16 x i32> <value>, <16 x i32>* <ptr>, i32 <alignment>,  <16 x i1> <mask>)
+       declare void @llvm.masked.store.v8i32  (<8  x i32>   <value>, <8  x i32>*   <ptr>, i32 <alignment>,  <8  x i1> <mask>)
+       declare void @llvm.masked.store.v16f32 (<16 x float> <value>, <16 x float>* <ptr>, i32 <alignment>,  <16 x i1> <mask>)
+       ;; The data is a vector of pointers to double
+       declare void @llvm.masked.store.v8p0f64    (<8 x double*> <value>, <8 x double*>* <ptr>, i32 <alignment>, <8 x i1> <mask>)
+       ;; The data is a vector of function pointers
+       declare void @llvm.masked.store.v4p0f_i32f (<4 x i32 ()*> <value>, <4 x i32 ()*>* <ptr>, i32 <alignment>, <4 x i1> <mask>)
 
 Overview:
 """""""""
@@ -10653,12 +11399,13 @@ LLVM provides intrinsics for vector gather and scatter operations. They are simi
 
 Syntax:
 """""""
-This is an overloaded intrinsic. The loaded data are multiple scalar values of any integer or floating point data type gathered together into one vector.
+This is an overloaded intrinsic. The loaded data are multiple scalar values of any integer, floating point or pointer data type gathered together into one vector.
 
 ::
 
-      declare <16 x float> @llvm.masked.gather.v16f32 (<16 x float*> <ptrs>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
-      declare <2 x double> @llvm.masked.gather.v2f64  (<2 x double*> <ptrs>, i32 <alignment>, <2 x i1>  <mask>, <2 x double> <passthru>)
+      declare <16 x float> @llvm.masked.gather.v16f32   (<16 x float*> <ptrs>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
+      declare <2 x double> @llvm.masked.gather.v2f64    (<2 x double*> <ptrs>, i32 <alignment>, <2 x i1>  <mask>, <2 x double> <passthru>)
+      declare <8 x float*> @llvm.masked.gather.v8p0f32  (<8 x float**> <ptrs>, i32 <alignment>, <8 x i1>  <mask>, <8 x float*> <passthru>)
 
 Overview:
 """""""""
@@ -10706,12 +11453,13 @@ The semantics of this operation are equivalent to a sequence of conditional scal
 
 Syntax:
 """""""
-This is an overloaded intrinsic. The data stored in memory is a vector of any integer or floating point data type. Each vector element is stored in an arbitrary memory addresses. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.
+This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating point or pointer data type. Each vector element is stored in an arbitrary memory address. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.
 
 ::
 
-       declare void @llvm.masked.scatter.v8i32 (<8 x i32>  <value>, <8 x i32*>  <ptrs>, i32 <alignment>,  <8 x i1>  <mask>)
-       declare void @llvm.masked.scatter.v16f32(<16 x i32> <value>, <16 x i32*> <ptrs>, i32 <alignment>,  <16 x i1> <mask>)
+       declare void @llvm.masked.scatter.v8i32   (<8 x i32>     <value>, <8 x i32*>     <ptrs>, i32 <alignment>, <8 x i1>  <mask>)
+       declare void @llvm.masked.scatter.v16f32  (<16 x float>  <value>, <16 x float*>  <ptrs>, i32 <alignment>, <16 x i1> <mask>)
+       declare void @llvm.masked.scatter.v4p0f64 (<4 x double*> <value>, <4 x double**> <ptrs>, i32 <alignment>, <4 x i1>  <mask>)
 
 Overview:
 """""""""
@@ -10727,7 +11475,7 @@ The first operand is a vector value to be written to memory. The second operand
 Semantics:
 """"""""""
 
-The '``llvm.masked.scatter``' intrinsics is designed for writing selected vector elements to arbitrary memory addresses in a single IR operation. The operation may be conditional, when not all bits in the mask are switched on. It is useful for targets that support vector masked scatter and allows vectorizing basic blocks with data and control divergency. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
+The '``llvm.masked.scatter``' intrinsics is designed for writing selected vector elements to arbitrary memory addresses in a single IR operation. The operation may be conditional, when not all bits in the mask are switched on. It is useful for targets that support vector masked scatter and allows vectorizing basic blocks with data and control divergence. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
 
 ::
 
@@ -10881,6 +11629,36 @@ Semantics:
 
 This intrinsic indicates that the memory is mutable again.
 
+'``llvm.invariant.group.barrier``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare i8* @llvm.invariant.group.barrier(i8* <ptr>)
+
+Overview:
+"""""""""
+
+The '``llvm.invariant.group.barrier``' intrinsic can be used when an invariant 
+established by invariant.group metadata no longer holds, to obtain a new pointer
+value that does not carry the invariant information.
+
+
+Arguments:
+""""""""""
+
+The ``llvm.invariant.group.barrier`` takes only one argument, which is
+the pointer to the memory for which the ``invariant.group`` no longer holds.
+
+Semantics:
+""""""""""
+
+Returns another pointer that aliases its argument but which is considered different 
+for the purposes of ``load``/``store`` ``invariant.group`` metadata.
+
 General Intrinsics
 ------------------
 
@@ -11253,7 +12031,7 @@ Arguments:
 """"""""""
 
 The first argument is a pointer to be tested. The second argument is a
-metadata string containing the name of a :doc:`bitset <BitSets>`.
+metadata object representing an identifier for a :doc:`bitset <BitSets>`.
 
 Overview:
 """""""""
diff --git a/docs/LibFuzzer.rst b/docs/LibFuzzer.rst
index 1ac75a406985..84adff3616f7 100644
--- a/docs/LibFuzzer.rst
+++ b/docs/LibFuzzer.rst
@@ -21,7 +21,8 @@ This library is intended primarily for in-process coverage-guided fuzz testing
   optimizations options (e.g. -O0, -O1, -O2) to diversify testing.
 * Build a test driver using the same options as the library.
   The test driver is a C/C++ file containing interesting calls to the library
-  inside a single function  ``extern "C" void LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size);``
+  inside a single function  ``extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size);``.
+  Currently, the only expected return value is 0, others are reserved for future.
 * Link the Fuzzer, the library and the driver together into an executable
   using the same sanitizer options as for the library.
 * Collect the initial corpus of inputs for the
@@ -60,14 +61,18 @@ The most important flags are::
   cross_over                         	1	If 1, cross over inputs.
   mutate_depth                       	5	Apply this number of consecutive mutations to each input.
   timeout                            	1200	Timeout in seconds (if positive). If one unit runs more than this number of seconds the process will abort.
+  max_total_time                        0       If positive, indicates the maximal total time in seconds to run the fuzzer.
   help                               	0	Print help.
-  save_minimized_corpus              	0	If 1, the minimized corpus is saved into the first input directory
+  merge                                 0       If 1, the 2-nd, 3-rd, etc corpora will be merged into the 1-st corpus. Only interesting units will be taken.
   jobs                               	0	Number of jobs to run. If jobs >= 1 we spawn this number of jobs in separate worker processes with stdout/stderr redirected to fuzz-JOB.log.
   workers                            	0	Number of simultaneous worker processes to run the jobs. If zero, "min(jobs,NumberOfCpuCores()/2)" is used.
-  tokens                             	0	Use the file with tokens (one token per line) to fuzz a token based input language.
-  apply_tokens                       	0	Read the given input file, substitute bytes  with tokens and write the result to stdout.
   sync_command                       	0	Execute an external command "<sync_command> <test_corpus>" to synchronize the test corpus.
   sync_timeout                       	600	Minimum timeout between syncs.
+  use_traces                            0       Experimental: use instruction traces
+  only_ascii                            0       If 1, generate only ASCII (isprint+isspace) inputs.
+  test_single_input                     ""      Use specified file content as test input. Test will be run only once. Useful for debugging a particular case.
+  artifact_prefix                       ""      Write fuzzing artifacts (crash, timeout, or slow inputs) as $(artifact_prefix)file
+  exact_artifact_path                   ""      Write the single artifact on failure (crash, timeout) as $(exact_artifact_path). This overrides -artifact_prefix and will not use checksum in the file name. Do not use the same path for several parallel processes.
 
 For the full list of flags run the fuzzer binary with ``-help=1``.
 
@@ -80,11 +85,14 @@ Toy example
 A simple function that does something interesting if it receives the input "HI!"::
 
   cat << EOF >> test_fuzzer.cc
-  extern "C" void LLVMFuzzerTestOneInput(const unsigned char *data, unsigned long size) {
+  #include <stdint.h>
+  #include <stddef.h>
+  extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
     if (size > 0 && data[0] == 'H')
       if (size > 1 && data[1] == 'I')
          if (size > 2 && data[2] == '!')
          __builtin_trap();
+    return 0;
   }
   EOF
   # Get lib/Fuzzer. Assuming that you already have fresh clang in PATH.
@@ -115,9 +123,10 @@ Here we show how to use lib/Fuzzer on something real, yet simple: pcre2_::
   # Build the actual function that does something interesting with PCRE2.
   cat << EOF > pcre_fuzzer.cc
   #include <string.h>
+  #include <stdint.h>
   #include "pcre2posix.h"
-  extern "C" void LLVMFuzzerTestOneInput(const unsigned char *data, size_t size) {
-    if (size < 1) return;
+  extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
+    if (size < 1) return 0;
     char *str = new char[size+1];
     memcpy(str, data, size);
     str[size] = 0;
@@ -127,6 +136,7 @@ Here we show how to use lib/Fuzzer on something real, yet simple: pcre2_::
       regfree(&preg);
     }
     delete [] str;
+    return 0;
   }
   EOF
   clang++ -g -fsanitize=address $COV_FLAGS -c -std=c++11  -I inst/include/ pcre_fuzzer.cc
@@ -213,6 +223,9 @@ to find Heartbleed with LibFuzzer::
   #include <openssl/ssl.h>
   #include <openssl/err.h>
   #include <assert.h>
+  #include <stdint.h>
+  #include <stddef.h>
+
   SSL_CTX *sctx;
   int Init() {
     SSL_library_init();
@@ -224,7 +237,7 @@ to find Heartbleed with LibFuzzer::
     assert (SSL_CTX_use_PrivateKey_file(sctx, "server.key", SSL_FILETYPE_PEM));
     return 0;
   }
-  extern "C" void LLVMFuzzerTestOneInput(unsigned char *Data, size_t Size) {
+  extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
     static int unused = Init();
     SSL *server = SSL_new(sctx);
     BIO *sinbio = BIO_new(BIO_s_mem());
@@ -234,9 +247,10 @@ to find Heartbleed with LibFuzzer::
     BIO_write(sinbio, Data, Size);
     SSL_do_handshake(server);
     SSL_free(server);
+    return 0;
   }
   EOF
-  # Build the fuzzer. 
+  # Build the fuzzer.
   clang++ -g handshake-fuzz.cc  -fsanitize=address \
     openssl-1.0.1f/libssl.a openssl-1.0.1f/libcrypto.a Fuzzer*.o
   # Run 20 independent fuzzer jobs.
@@ -252,26 +266,43 @@ Voila::
       #1 0x4db504 in tls1_process_heartbeat openssl-1.0.1f/ssl/t1_lib.c:2586:3
       #2 0x580be3 in ssl3_read_bytes openssl-1.0.1f/ssl/s3_pkt.c:1092:4
 
+Note: a `similar fuzzer <https://boringssl.googlesource.com/boringssl/+/HEAD/FUZZING.md>`_
+is now a part of the boringssl source tree.
+
 Advanced features
 =================
 
-Tokens
-------
-
-By default, the fuzzer is not aware of complexities of the input language
-and when fuzzing e.g. a C++ parser it will mostly stress the lexer.
-It is very hard for the fuzzer to come up with something like ``reinterpret_cast<int>``
-from a test corpus that doesn't have it.
-See a detailed discussion of this topic at
-http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html.
-
-lib/Fuzzer implements a simple technique that allows to fuzz input languages with
-long tokens. All you need is to prepare a text file containing up to 253 tokens, one token per line,
-and pass it to the fuzzer as ``-tokens=TOKENS_FILE.txt``.
-Three implicit tokens are added: ``" "``, ``"\t"``, and ``"\n"``.
-The fuzzer itself will still be mutating a string of bytes
-but before passing this input to the target library it will replace every byte ``b`` with the ``b``-th token.
-If there are less than ``b`` tokens, a space will be added instead.
+Dictionaries
+------------
+*EXPERIMENTAL*.
+LibFuzzer supports user-supplied dictionaries with input language keywords
+or other interesting byte sequences (e.g. multi-byte magic values).
+Use ``-dict=DICTIONARY_FILE``. For some input languages using a dictionary
+may significantly improve the search speed.
+The dictionary syntax is similar to that used by AFL_ for its ``-x`` option::
+
+  # Lines starting with '#' and empty lines are ignored.
+
+  # Adds "blah" (w/o quotes) to the dictionary.
+  kw1="blah"
+  # Use \\ for backslash and \" for quotes.
+  kw2="\"ac\\dc\""
+  # Use \xAB for hex values
+  kw3="\xF7\xF8"
+  # the name of the keyword followed by '=' may be omitted:
+  "foo\x0Abar"
+
+Data-flow-guided fuzzing
+------------------------
+
+*EXPERIMENTAL*.
+With an additional compiler flag ``-fsanitize-coverage=trace-cmp`` (see SanitizerCoverageTraceDataFlow_)
+and extra run-time flag ``-use_traces=1`` the fuzzer will try to apply *data-flow-guided fuzzing*.
+That is, the fuzzer will record the inputs to comparison instructions, switch statements,
+and several libc functions (``memcmp``, ``strcmp``, ``strncmp``, etc).
+It will later use those recorded inputs during mutations.
+
+This mode can be combined with DataFlowSanitizer_ to achieve better sensitivity.
 
 AFL compatibility
 -----------------
@@ -321,18 +352,38 @@ Build (make sure to use fresh clang as the host compiler)::
 
 Optionally build other kinds of binaries (asan+Debug, msan, ubsan, etc).
 
-TODO: commit the pre-fuzzed corpus to svn (?).
-
 Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052
 
 clang-fuzzer
 ------------
 
-The default behavior is very similar to ``clang-format-fuzzer``.
-Clang can also be fuzzed with Tokens_ using ``-tokens=$LLVM/lib/Fuzzer/cxx_fuzzer_tokens.txt`` option.
+The behavior is very similar to ``clang-format-fuzzer``.
 
 Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057
 
+llvm-as-fuzzer
+--------------
+
+Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=24639
+
+llvm-mc-fuzzer
+--------------
+
+This tool fuzzes the MC layer. Currently it is only able to fuzz the
+disassembler but it is hoped that assembly, and round-trip verification will be
+added in future.
+
+When run in dissassembly mode, the inputs are opcodes to be disassembled. The
+fuzzer will consume as many instructions as possible and will stop when it
+finds an invalid instruction or runs out of data.
+
+Please note that the command line interface differs slightly from that of other
+fuzzers. The fuzzer arguments should follow ``--fuzzer-args`` and should have
+a single dash, while other arguments control the operation mode and target in a
+similar manner to ``llvm-mc`` and should have two dashes. For example::
+
+  llvm-mc-fuzzer --triple=aarch64-linux-gnu --disassemble --fuzzer-args -max_len=4 -jobs=10
+
 Buildbot
 --------
 
@@ -348,7 +399,7 @@ The corpuses are stored in git on github and can be used like this::
   git clone https://github.com/kcc/fuzzing-with-sanitizers.git
   bin/clang-format-fuzzer fuzzing-with-sanitizers/llvm/clang-format/C1
   bin/clang-fuzzer        fuzzing-with-sanitizers/llvm/clang/C1/
-  bin/clang-fuzzer        fuzzing-with-sanitizers/llvm/clang/TOK1  -tokens=$LLVM/llvm/lib/Fuzzer/cxx_fuzzer_tokens.txt
+  bin/llvm-as-fuzzer      fuzzing-with-sanitizers/llvm/llvm-as/C1  -only_ascii=1
 
 
 FAQ
@@ -407,11 +458,46 @@ small inputs, each input takes < 1ms to run, and the library code is not expecte
 to crash on invalid inputs.
 Examples: regular expression matchers, text or binary format parsers.
 
+Trophies
+========
+* GLIBC: https://sourceware.org/glibc/wiki/FuzzingLibc
+
+* MUSL LIBC:
+
+  * http://git.musl-libc.org/cgit/musl/commit/?id=39dfd58417ef642307d90306e1c7e50aaec5a35c
+  * http://www.openwall.com/lists/oss-security/2015/03/30/3
+
+* `pugixml <https://github.com/zeux/pugixml/issues/39>`_
+
+* PCRE: Search for "LLVM fuzzer" in http://vcs.pcre.org/pcre2/code/trunk/ChangeLog?view=markup;
+  also in `bugzilla <https://bugs.exim.org/buglist.cgi?bug_status=__all__&content=libfuzzer&no_redirect=1&order=Importance&product=PCRE&query_format=specific>`_
+
+* `ICU <http://bugs.icu-project.org/trac/ticket/11838>`_
+
+* `Freetype <https://savannah.nongnu.org/search/?words=LibFuzzer&type_of_search=bugs&Search=Search&exact=1#options>`_
+
+* `Harfbuzz <https://github.com/behdad/harfbuzz/issues/139>`_
+
+* `SQLite <http://www3.sqlite.org/cgi/src/info/088009efdd56160b>`_
+
+* `Python <http://bugs.python.org/issue25388>`_
+
+* OpenSSL/BoringSSL: `[1] <https://boringssl.googlesource.com/boringssl/+/cb852981cd61733a7a1ae4fd8755b7ff950e857d>`_
+
+* `Libxml2
+  <https://bugzilla.gnome.org/buglist.cgi?bug_status=__all__&content=libFuzzer&list_id=68957&order=Importance&product=libxml2&query_format=specific>`_
+
+* `Linux Kernel's BPF verifier <https://github.com/iovisor/bpf-fuzzer>`_
+
+* LLVM: `Clang <https://llvm.org/bugs/show_bug.cgi?id=23057>`_, `Clang-format <https://llvm.org/bugs/show_bug.cgi?id=23052>`_, `libc++ <https://llvm.org/bugs/show_bug.cgi?id=24411>`_, `llvm-as <https://llvm.org/bugs/show_bug.cgi?id=24639>`_, Disassembler: http://reviews.llvm.org/rL247405, http://reviews.llvm.org/rL247414, http://reviews.llvm.org/rL247416, http://reviews.llvm.org/rL247417, http://reviews.llvm.org/rL247420, http://reviews.llvm.org/rL247422.
+
 .. _pcre2: http://www.pcre.org/
 
 .. _AFL: http://lcamtuf.coredump.cx/afl/
 
 .. _SanitizerCoverage: http://clang.llvm.org/docs/SanitizerCoverage.html
+.. _SanitizerCoverageTraceDataFlow: http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow
+.. _DataFlowSanitizer: http://clang.llvm.org/docs/DataFlowSanitizer.html
 
 .. _Heartbleed: http://en.wikipedia.org/wiki/Heartbleed
 
diff --git a/docs/MIRLangRef.rst b/docs/MIRLangRef.rst
new file mode 100644
index 000000000000..a5f8c8c743ab
--- /dev/null
+++ b/docs/MIRLangRef.rst
@@ -0,0 +1,495 @@
+========================================
+Machine IR (MIR) Format Reference Manual
+========================================
+
+.. contents::
+   :local:
+
+.. warning::
+  This is a work in progress.
+
+Introduction
+============
+
+This document is a reference manual for the Machine IR (MIR) serialization
+format. MIR is a human readable serialization format that is used to represent
+LLVM's :ref:`machine specific intermediate representation
+<machine code representation>`.
+
+The MIR serialization format is designed to be used for testing the code
+generation passes in LLVM.
+
+Overview
+========
+
+The MIR serialization format uses a YAML container. YAML is a standard
+data serialization language, and the full YAML language spec can be read at
+`yaml.org
+<http://www.yaml.org/spec/1.2/spec.html#Introduction>`_.
+
+A MIR file is split up into a series of `YAML documents`_. The first document
+can contain an optional embedded LLVM IR module, and the rest of the documents
+contain the serialized machine functions.
+
+.. _YAML documents: http://www.yaml.org/spec/1.2/spec.html#id2800132
+
+MIR Testing Guide
+=================
+
+You can use the MIR format for testing in two different ways:
+
+- You can write MIR tests that invoke a single code generation pass using the
+  ``run-pass`` option in llc.
+
+- You can use llc's ``stop-after`` option with existing or new LLVM assembly
+  tests and check the MIR output of a specific code generation pass.
+
+Testing Individual Code Generation Passes
+-----------------------------------------
+
+The ``run-pass`` option in llc allows you to create MIR tests that invoke
+just a single code generation pass. When this option is used, llc will parse
+an input MIR file, run the specified code generation pass, and print the
+resulting MIR to the standard output stream.
+
+You can generate an input MIR file for the test by using the ``stop-after``
+option in llc. For example, if you would like to write a test for the
+post register allocation pseudo instruction expansion pass, you can specify
+the machine copy propagation pass in the ``stop-after`` option, as it runs
+just before the pass that we are trying to test:
+
+   ``llc -stop-after machine-cp bug-trigger.ll > test.mir``
+
+After generating the input MIR file, you'll have to add a run line that uses
+the ``-run-pass`` option to it. In order to test the post register allocation
+pseudo instruction expansion pass on X86-64, a run line like the one shown
+below can be used:
+
+    ``# RUN: llc -run-pass postrapseudos -march=x86-64 %s -o /dev/null | FileCheck %s``
+
+The MIR files are target dependent, so they have to be placed in the target
+specific test directories. They also need to specify a target triple or a
+target architecture either in the run line or in the embedded LLVM IR module.
+
+Limitations
+-----------
+
+Currently the MIR format has several limitations in terms of which state it
+can serialize:
+
+- The target-specific state in the target-specific ``MachineFunctionInfo``
+  subclasses isn't serialized at the moment.
+
+- The target-specific ``MachineConstantPoolValue`` subclasses (in the ARM and
+  SystemZ backends) aren't serialized at the moment.
+
+- The ``MCSymbol`` machine operands are only printed, they can't be parsed.
+
+- A lot of the state in ``MachineModuleInfo`` isn't serialized - only the CFI
+  instructions and the variable debug information from MMI is serialized right
+  now.
+
+These limitations impose restrictions on what you can test with the MIR format.
+For now, tests that would like to test some behaviour that depends on the state
+of certain ``MCSymbol``  operands or the exception handling state in MMI, can't
+use the MIR format. As well as that, tests that test some behaviour that
+depends on the state of the target specific ``MachineFunctionInfo`` or
+``MachineConstantPoolValue`` subclasses can't use the MIR format at the moment.
+
+High Level Structure
+====================
+
+.. _embedded-module:
+
+Embedded Module
+---------------
+
+When the first YAML document contains a `YAML block literal string`_, the MIR
+parser will treat this string as an LLVM assembly language string that
+represents an embedded LLVM IR module.
+Here is an example of a YAML document that contains an LLVM module:
+
+.. code-block:: llvm
+
+     --- |
+       define i32 @inc(i32* %x) {
+       entry:
+         %0 = load i32, i32* %x
+         %1 = add i32 %0, 1
+         store i32 %1, i32* %x
+         ret i32 %1
+       }
+     ...
+
+.. _YAML block literal string: http://www.yaml.org/spec/1.2/spec.html#id2795688
+
+Machine Functions
+-----------------
+
+The remaining YAML documents contain the machine functions. This is an example
+of such YAML document:
+
+.. code-block:: llvm
+
+     ---
+     name:            inc
+     tracksRegLiveness: true
+     liveins:
+       - { reg: '%rdi' }
+     body: |
+       bb.0.entry:
+         liveins: %rdi
+
+         %eax = MOV32rm %rdi, 1, _, 0, _
+         %eax = INC32r killed %eax, implicit-def dead %eflags
+         MOV32mr killed %rdi, 1, _, 0, _, %eax
+         RETQ %eax
+     ...
+
+The document above consists of attributes that represent the various
+properties and data structures in a machine function.
+
+The attribute ``name`` is required, and its value should be identical to the
+name of a function that this machine function is based on.
+
+The attribute ``body`` is a `YAML block literal string`_. Its value represents
+the function's machine basic blocks and their machine instructions.
+
+Machine Instructions Format Reference
+=====================================
+
+The machine basic blocks and their instructions are represented using a custom,
+human readable serialization language. This language is used in the
+`YAML block literal string`_ that corresponds to the machine function's body.
+
+A source string that uses this language contains a list of machine basic
+blocks, which are described in the section below.
+
+Machine Basic Blocks
+--------------------
+
+A machine basic block is defined in a single block definition source construct
+that contains the block's ID.
+The example below defines two blocks that have an ID of zero and one:
+
+.. code-block:: llvm
+
+    bb.0:
+      <instructions>
+    bb.1:
+      <instructions>
+
+A machine basic block can also have a name. It should be specified after the ID
+in the block's definition:
+
+.. code-block:: llvm
+
+    bb.0.entry:       ; This block's name is "entry"
+       <instructions>
+
+The block's name should be identical to the name of the IR block that this
+machine block is based on.
+
+Block References
+^^^^^^^^^^^^^^^^
+
+The machine basic blocks are identified by their ID numbers. Individual
+blocks are referenced using the following syntax:
+
+.. code-block:: llvm
+
+    %bb.<id>[.<name>]
+
+Examples:
+
+.. code-block:: llvm
+
+    %bb.0
+    %bb.1.then
+
+Successors
+^^^^^^^^^^
+
+The machine basic block's successors have to be specified before any of the
+instructions:
+
+.. code-block:: llvm
+
+    bb.0.entry:
+      successors: %bb.1.then, %bb.2.else
+      <instructions>
+    bb.1.then:
+      <instructions>
+    bb.2.else:
+      <instructions>
+
+The branch weights can be specified in brackets after the successor blocks.
+The example below defines a block that has two successors with branch weights
+of 32 and 16:
+
+.. code-block:: llvm
+
+    bb.0.entry:
+      successors: %bb.1.then(32), %bb.2.else(16)
+
+.. _bb-liveins:
+
+Live In Registers
+^^^^^^^^^^^^^^^^^
+
+The machine basic block's live in registers have to be specified before any of
+the instructions:
+
+.. code-block:: llvm
+
+    bb.0.entry:
+      liveins: %edi, %esi
+
+The list of live in registers and successors can be empty. The language also
+allows multiple live in register and successor lists - they are combined into
+one list by the parser.
+
+Miscellaneous Attributes
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The attributes ``IsAddressTaken``, ``IsLandingPad`` and ``Alignment`` can be
+specified in brackets after the block's definition:
+
+.. code-block:: llvm
+
+    bb.0.entry (address-taken):
+      <instructions>
+    bb.2.else (align 4):
+      <instructions>
+    bb.3(landing-pad, align 4):
+      <instructions>
+
+.. TODO: Describe the way the reference to an unnamed LLVM IR block can be
+   preserved.
+
+Machine Instructions
+--------------------
+
+A machine instruction is composed of a name,
+:ref:`machine operands <machine-operands>`,
+:ref:`instruction flags <instruction-flags>`, and machine memory operands.
+
+The instruction's name is usually specified before the operands. The example
+below shows an instance of the X86 ``RETQ`` instruction with a single machine
+operand:
+
+.. code-block:: llvm
+
+    RETQ %eax
+
+However, if the machine instruction has one or more explicitly defined register
+operands, the instruction's name has to be specified after them. The example
+below shows an instance of the AArch64 ``LDPXpost`` instruction with three
+defined register operands:
+
+.. code-block:: llvm
+
+    %sp, %fp, %lr = LDPXpost %sp, 2
+
+The instruction names are serialized using the exact definitions from the
+target's ``*InstrInfo.td`` files, and they are case sensitive. This means that
+similar instruction names like ``TSTri`` and ``tSTRi`` represent different
+machine instructions.
+
+.. _instruction-flags:
+
+Instruction Flags
+^^^^^^^^^^^^^^^^^
+
+The flag ``frame-setup`` can be specified before the instruction's name:
+
+.. code-block:: llvm
+
+    %fp = frame-setup ADDXri %sp, 0, 0
+
+.. _registers:
+
+Registers
+---------
+
+Registers are one of the key primitives in the machine instructions
+serialization language. They are primarly used in the
+:ref:`register machine operands <register-operands>`,
+but they can also be used in a number of other places, like the
+:ref:`basic block's live in list <bb-liveins>`.
+
+The physical registers are identified by their name. They use the following
+syntax:
+
+.. code-block:: llvm
+
+    %<name>
+
+The example below shows three X86 physical registers:
+
+.. code-block:: llvm
+
+    %eax
+    %r15
+    %eflags
+
+The virtual registers are identified by their ID number. They use the following
+syntax:
+
+.. code-block:: llvm
+
+    %<id>
+
+Example:
+
+.. code-block:: llvm
+
+    %0
+
+The null registers are represented using an underscore ('``_``'). They can also be
+represented using a '``%noreg``' named register, although the former syntax
+is preferred.
+
+.. _machine-operands:
+
+Machine Operands
+----------------
+
+There are seventeen different kinds of machine operands, and all of them, except
+the ``MCSymbol`` operand, can be serialized. The ``MCSymbol`` operands are
+just printed out - they can't be parsed back yet.
+
+Immediate Operands
+^^^^^^^^^^^^^^^^^^
+
+The immediate machine operands are untyped, 64-bit signed integers. The
+example below shows an instance of the X86 ``MOV32ri`` instruction that has an
+immediate machine operand ``-42``:
+
+.. code-block:: llvm
+
+    %eax = MOV32ri -42
+
+.. TODO: Describe the CIMM (Rare) and FPIMM immediate operands.
+
+.. _register-operands:
+
+Register Operands
+^^^^^^^^^^^^^^^^^
+
+The :ref:`register <registers>` primitive is used to represent the register
+machine operands. The register operands can also have optional
+:ref:`register flags <register-flags>`,
+:ref:`a subregister index <subregister-indices>`,
+and a reference to the tied register operand.
+The full syntax of a register operand is shown below:
+
+.. code-block:: llvm
+
+    [<flags>] <register> [ :<subregister-idx-name> ] [ (tied-def <tied-op>) ]
+
+This example shows an instance of the X86 ``XOR32rr`` instruction that has
+5 register operands with different register flags:
+
+.. code-block:: llvm
+
+  dead %eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags, implicit-def %al
+
+.. _register-flags:
+
+Register Flags
+~~~~~~~~~~~~~~
+
+The table below shows all of the possible register flags along with the
+corresponding internal ``llvm::RegState`` representation:
+
+.. list-table::
+   :header-rows: 1
+
+   * - Flag
+     - Internal Value
+
+   * - ``implicit``
+     - ``RegState::Implicit``
+
+   * - ``implicit-def``
+     - ``RegState::ImplicitDefine``
+
+   * - ``def``
+     - ``RegState::Define``
+
+   * - ``dead``
+     - ``RegState::Dead``
+
+   * - ``killed``
+     - ``RegState::Kill``
+
+   * - ``undef``
+     - ``RegState::Undef``
+
+   * - ``internal``
+     - ``RegState::InternalRead``
+
+   * - ``early-clobber``
+     - ``RegState::EarlyClobber``
+
+   * - ``debug-use``
+     - ``RegState::Debug``
+
+.. _subregister-indices:
+
+Subregister Indices
+~~~~~~~~~~~~~~~~~~~
+
+The register machine operands can reference a portion of a register by using
+the subregister indices. The example below shows an instance of the ``COPY``
+pseudo instruction that uses the X86 ``sub_8bit`` subregister index to copy 8
+lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1:
+
+.. code-block:: llvm
+
+    %1 = COPY %0:sub_8bit
+
+The names of the subregister indices are target specific, and are typically
+defined in the target's ``*RegisterInfo.td`` file.
+
+Global Value Operands
+^^^^^^^^^^^^^^^^^^^^^
+
+The global value machine operands reference the global values from the
+:ref:`embedded LLVM IR module <embedded-module>`.
+The example below shows an instance of the X86 ``MOV64rm`` instruction that has
+a global value operand named ``G``:
+
+.. code-block:: llvm
+
+    %rax = MOV64rm %rip, 1, _, @G, _
+
+The named global values are represented using an identifier with the '@' prefix.
+If the identifier doesn't match the regular expression
+`[-a-zA-Z$._][-a-zA-Z$._0-9]*`, then this identifier must be quoted.
+
+The unnamed global values are represented using an unsigned numeric value with
+the '@' prefix, like in the following examples: ``@0``, ``@989``.
+
+.. TODO: Describe the parsers default behaviour when optional YAML attributes
+   are missing.
+.. TODO: Describe the syntax for the bundled instructions.
+.. TODO: Describe the syntax for virtual register YAML definitions.
+.. TODO: Describe the machine function's YAML flag attributes.
+.. TODO: Describe the syntax for the external symbol and register
+   mask machine operands.
+.. TODO: Describe the frame information YAML mapping.
+.. TODO: Describe the syntax of the stack object machine operands and their
+   YAML definitions.
+.. TODO: Describe the syntax of the constant pool machine operands and their
+   YAML definitions.
+.. TODO: Describe the syntax of the jump table machine operands and their
+   YAML definitions.
+.. TODO: Describe the syntax of the block address machine operands.
+.. TODO: Describe the syntax of the CFI index machine operands.
+.. TODO: Describe the syntax of the metadata machine operands, and the
+   instructions debug location attribute.
+.. TODO: Describe the syntax of the target index machine operands.
+.. TODO: Describe the syntax of the register live out machine operands.
+.. TODO: Describe the syntax of the machine memory operands.
diff --git a/docs/Phabricator.rst b/docs/Phabricator.rst
index 3426bfff164f..73704d9b17d7 100644
--- a/docs/Phabricator.rst
+++ b/docs/Phabricator.rst
@@ -21,7 +21,7 @@ click the power icon in the top right. You can register with a GitHub account,
 a Google account, or you can create your own profile.
 
 Make *sure* that the email address registered with Phabricator is subscribed
-to the relevant -commits mailing list. If your are not subscribed to the commit
+to the relevant -commits mailing list. If you are not subscribed to the commit
 list, all mail sent by Phabricator on your behalf will be held for moderation.
 
 Note that if you use your Subversion user name as Phabricator user name,
@@ -66,7 +66,7 @@ To upload a new patch:
 * Leave the drop down on *Create a new Revision...* and click *Continue*.
 * Enter a descriptive title and summary.  The title and summary are usually
   in the form of a :ref:`commit message <commit messages>`.
-* Add reviewers and mailing
+* Add reviewers (see below for advice) and subscribe mailing
   lists that you want to be included in the review. If your patch is
   for LLVM, add llvm-commits as a Subscriber; if your patch is for Clang,
   add cfe-commits.
@@ -83,6 +83,24 @@ To submit an updated patch:
 * Leave the Repository and Project fields blank.
 * Add comments about the changes in the new diff. Click *Save*.
 
+Choosing reviewers: You typically pick one or two people as initial reviewers.
+This choice is not crucial, because you are merely suggesting and not requiring
+them to participate. Many people will see the email notification on cfe-commits
+or llvm-commits, and if the subject line suggests the patch is something they
+should look at, they will.
+
+Here are a couple of ways to pick the initial reviewer(s):
+
+* Use ``svn blame`` and the commit log to find names of people who have
+  recently modified the same area of code that you are modifying.
+* Look in CODE_OWNERS.TXT to see who might be responsible for that area.
+* If you've discussed the change on a dev list, the people who participated
+  might be appropriate reviewers.
+
+Even if you think the code owner is the busiest person in the world, it's still
+okay to put them as a reviewer. Being the code owner means they have accepted
+responsibility for making sure the review happens.
+
 Reviewing code with Phabricator
 -------------------------------
 
@@ -162,6 +180,6 @@ trivially a good fit for an official LLVM project.
 .. _LLVM's Phabricator: http://reviews.llvm.org
 .. _`http://reviews.llvm.org`: http://reviews.llvm.org
 .. _Code Repository Browser: http://reviews.llvm.org/diffusion/
-.. _Arcanist Quick Start: http://www.phabricator.com/docs/phabricator/article/Arcanist_Quick_Start.html
-.. _Arcanist User Guide: http://www.phabricator.com/docs/phabricator/article/Arcanist_User_Guide.html
+.. _Arcanist Quick Start: https://secure.phabricator.com/book/phabricator/article/arcanist_quick_start/
+.. _Arcanist User Guide: https://secure.phabricator.com/book/phabricator/article/arcanist/
 .. _llvm-reviews GitHub project: https://github.com/r4nt/llvm-reviews/
diff --git a/docs/ProgrammersManual.rst b/docs/ProgrammersManual.rst
index 08cc61a187b5..44f76fef8f1f 100644
--- a/docs/ProgrammersManual.rst
+++ b/docs/ProgrammersManual.rst
@@ -366,7 +366,7 @@ Then you can run your pass like this:
 
 Using the ``DEBUG()`` macro instead of a home-brewed solution allows you to not
 have to create "yet another" command line option for the debug output for your
-pass.  Note that ``DEBUG()`` macros are disabled for optimized builds, so they
+pass.  Note that ``DEBUG()`` macros are disabled for non-asserts builds, so they
 do not cause a performance impact at all (for the same reason, they should also
 not contain side-effects!).
 
@@ -383,21 +383,17 @@ Fine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option
 Sometimes you may find yourself in a situation where enabling ``-debug`` just
 turns on **too much** information (such as when working on the code generator).
 If you want to enable debug information with more fine-grained control, you
-can define the ``DEBUG_TYPE`` macro and use the ``-debug-only`` option as
+should define the ``DEBUG_TYPE`` macro and use the ``-debug-only`` option as
 follows:
 
 .. code-block:: c++
 
-  #undef  DEBUG_TYPE
-  DEBUG(errs() << "No debug type\n");
   #define DEBUG_TYPE "foo"
   DEBUG(errs() << "'foo' debug type\n");
   #undef  DEBUG_TYPE
   #define DEBUG_TYPE "bar"
   DEBUG(errs() << "'bar' debug type\n"));
   #undef  DEBUG_TYPE
-  #define DEBUG_TYPE ""
-  DEBUG(errs() << "No debug type (2)\n");
 
 Then you can run your pass like this:
 
@@ -406,24 +402,22 @@ Then you can run your pass like this:
   $ opt < a.bc > /dev/null -mypass
   <no output>
   $ opt < a.bc > /dev/null -mypass -debug
-  No debug type
   'foo' debug type
   'bar' debug type
-  No debug type (2)
   $ opt < a.bc > /dev/null -mypass -debug-only=foo
   'foo' debug type
   $ opt < a.bc > /dev/null -mypass -debug-only=bar
   'bar' debug type
 
 Of course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file,
-to specify the debug type for the entire module (if you do this before you
-``#include "llvm/Support/Debug.h"``, you don't have to insert the ugly
-``#undef``'s).  Also, you should use names more meaningful than "foo" and "bar",
-because there is no system in place to ensure that names do not conflict.  If
-two different modules use the same string, they will all be turned on when the
-name is specified.  This allows, for example, all debug information for
-instruction scheduling to be enabled with ``-debug-only=InstrSched``, even if
-the source lives in multiple files.
+to specify the debug type for the entire module. Be careful that you only do
+this after including Debug.h and not around any #include of headers. Also, you
+should use names more meaningful than "foo" and "bar", because there is no
+system in place to ensure that names do not conflict. If two different modules
+use the same string, they will all be turned on when the name is specified.
+This allows, for example, all debug information for instruction scheduling to be
+enabled with ``-debug-only=InstrSched``, even if the source lives in multiple
+files.
 
 For performance reasons, -debug-only is not available in optimized build
 (``--enable-optimized``) of LLVM.
@@ -435,10 +429,8 @@ preceding example could be written as:
 
 .. code-block:: c++
 
-  DEBUG_WITH_TYPE("", errs() << "No debug type\n");
   DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
   DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
-  DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n");
 
 .. _Statistic:
 
diff --git a/docs/README.txt b/docs/README.txt
index 3d6342929808..31764b2951b2 100644
--- a/docs/README.txt
+++ b/docs/README.txt
@@ -44,7 +44,7 @@ viewable online (as noted above) at e.g.
 Checking links
 ==============
 
-The reachibility of external links in the documentation can be checked by
+The reachability of external links in the documentation can be checked by
 running:
 
     cd docs/
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst
index b68f5ecd493e..b3f7c005ed19 100644
--- a/docs/ReleaseNotes.rst
+++ b/docs/ReleaseNotes.rst
@@ -1,15 +1,21 @@
 ======================
-LLVM 3.7 Release Notes
+LLVM 3.8 Release Notes
 ======================
 
 .. contents::
     :local:
 
+.. warning::
+   These are in-progress notes for the upcoming LLVM 3.8 release.  You may
+   prefer the `LLVM 3.7 Release Notes <http://llvm.org/releases/3.7.0/docs
+   /ReleaseNotes.html>`_.
+
+
 Introduction
 ============
 
 This document contains the release notes for the LLVM Compiler Infrastructure,
-release 3.7.  Here we describe the status of LLVM, including major improvements
+release 3.8.  Here we describe the status of LLVM, including major improvements
 from the previous release, improvements in various subprojects of LLVM, and
 some of the current users of the code.  All LLVM releases may be downloaded
 from the `LLVM releases web site <http://llvm.org/releases/>`_.
@@ -25,467 +31,105 @@ LLVM web page, this document applies to the *next* release, not the current
 one.  To see the release notes for a specific release, please see the `releases
 page <http://llvm.org/releases/>`_.
 
-Major changes in 3.7.1
-======================
-
-* 3.7.0 was released with an inadvertent change to the signature of the C
-  API function: LLVMBuildLandingPad, which made the C API incompatible with
-  prior releases.  This has been corrected in LLVM 3.7.1.
-
-  As a result of this change, 3.7.0 is not ABI compatible with 3.7.1.
-
-  +----------------------------------------------------------------------------+
-  | History of the LLVMBuildLandingPad() function                              |
-  +===========================+================================================+
-  | 3.6.2 and prior releases  | LLVMBuildLandingPad(LLVMBuilderRef,            |
-  |                           |                     LLVMTypeRef,               |
-  |                           |                     LLVMValueRef,              |
-  |                           |                     unsigned, const char*)     |
-  +---------------------------+------------------------------------------------+
-  | 3.7.0                     | LLVMBuildLandingPad(LLVMBuilderRef,            |
-  |                           |                     LLVMTypeRef,               |
-  |                           |                     unsigned, const char*)     |
-  +---------------------------+------------------------------------------------+
-  | 3.7.1 and future releases | LLVMBuildLandingPad(LLVMBuilderRef,            |
-  |                           |                     LLVMTypeRef,               |
-  |                           |                     LLVMValueRef,              |
-  |                           |                     unsigned, const char*)     |
-  +---------------------------+------------------------------------------------+
-
-
-Non-comprehensive list of changes in 3.7.0
+Non-comprehensive list of changes in this release
 =================================================
+* With this release, the minimum Windows version required for running LLVM is
+  Windows 7. Earlier versions, including Windows Vista and XP are no longer
+  supported.
 
-.. NOTE
-   For small 1-3 sentence descriptions, just add an entry at the end of
-   this list. If your description won't fit comfortably in one bullet
-   point (e.g. maybe you would like to give an example of the
-   functionality, or simply have a lot to talk about), see the `NOTE` below
-   for adding a new subsection.
-
-* The minimum required Visual Studio version for building LLVM is now 2013
-  Update 4.
-
-* A new documentation page, :doc:`Frontend/PerformanceTips`, contains a
-  collection of tips for frontend authors on how to generate IR which LLVM is
-  able to effectively optimize.
-
-* The ``DataLayout`` is no longer optional. All the IR level optimizations expects
-  it to be present and the API has been changed to use a reference instead of
-  a pointer to make it explicit. The Module owns the datalayout and it has to
-  match the one attached to the TargetMachine for generating code.
+* With this release, the autoconf build system is deprecated. It will be removed
+  in the 3.9 release. Please migrate to using CMake. For more information see:
+  `Building LLVM with CMake <CMake.html>`_
 
-  In 3.6, a pass was inserted in the pipeline to make the ``DataLayout`` accessible:
-    ``MyPassManager->add(new DataLayoutPass(MyTargetMachine->getDataLayout()));``
-  In 3.7, you don't need a pass, you set the ``DataLayout`` on the ``Module``:
-    ``MyModule->setDataLayout(MyTargetMachine->createDataLayout());``
+* The C API function LLVMLinkModules is deprecated. It will be removed in the
+  3.9 release. Please migrate to LLVMLinkModules2. Unlike the old function the
+  new one
 
-  The LLVM C API ``LLVMGetTargetMachineData`` is deprecated to reflect the fact
-  that it won't be available anymore from ``TargetMachine`` in 3.8.
+   * Doesn't take an unused parameter.
+   * Destroys the source instead of only damaging it.
+   * Does not record a message. Use the diagnostic handler instead.
 
-* Comdats are now orthogonal to the linkage. LLVM will not create
-  comdats for weak linkage globals and the frontends are responsible
-  for explicitly adding them.
+* The C API functions LLVMParseBitcode, LLVMParseBitcodeInContext,
+  LLVMGetBitcodeModuleInContext and LLVMGetBitcodeModule have been deprecated.
+  They will be removed in 3.9. Please migrate to the versions with a 2 suffix.
+  Unlike the old ones the new ones do not record a diagnostic message. Use
+  the diagnostic handler instead.
 
-* On ELF we now support multiple sections with the same name and
-  comdat. This allows for smaller object files since multiple
-  sections can have a simple name (`.text`, `.rodata`, etc).
+* The deprecated C APIs LLVMGetBitcodeModuleProviderInContext and
+  LLVMGetBitcodeModuleProvider have been removed.
 
-* LLVM now lazily loads metadata in some cases. Creating archives
-  with IR files with debug info is now 25X faster.
+* The deprecated C APIs LLVMCreateExecutionEngine, LLVMCreateInterpreter,
+  LLVMCreateJITCompiler, LLVMAddModuleProvider and LLVMRemoveModuleProvider
+  have been removed.
 
-* llvm-ar can create archives in the BSD format used by OS X.
+* With this release, the C API headers have been reorganized to improve build
+  time. Type specific declarations have been moved to Type.h, and error
+  handling routines have been moved to ErrorHandling.h. Both are included in
+  Core.h so nothing should change for projects directly including the headers,
+  but transitive dependencies may be affected.
 
-* LLVM received a backend for the extended Berkely Packet Filter
-  instruction set that can be dynamically loaded into the Linux kernel via the
-  `bpf(2) <http://man7.org/linux/man-pages/man2/bpf.2.html>`_ syscall.
-
-  Support for BPF has been present in the kernel for some time, but starting
-  from 3.18 has been extended with such features as: 64-bit registers, 8
-  additional registers registers, conditional backwards jumps, call
-  instruction, shift instructions, map (hash table, array, etc.), 1-8 byte
-  load/store from stack, and more.
+.. NOTE
+   For small 1-3 sentence descriptions, just add an entry at the end of
+   this list. If your description won't fit comfortably in one bullet
+   point (e.g. maybe you would like to give an example of the
+   functionality, or simply have a lot to talk about), see the `NOTE` below
+   for adding a new subsection.
 
-  Up until now, users of BPF had to write bytecode by hand, or use
-  custom generators. This release adds a proper LLVM backend target for the BPF
-  bytecode architecture.
+* ... next change ...
 
-  The BPF target is now available by default, and options exist in both Clang
-  (-target bpf) or llc (-march=bpf) to pick eBPF as a backend.
+.. NOTE
+   If you would like to document a larger change, then you can add a
+   subsection about it right here. You can copy the following boilerplate
+   and un-indent it (the indentation causes it to be inside this comment).
 
-* Switch-case lowering was rewritten to avoid generating unbalanced search trees
-  (`PR22262 <http://llvm.org/pr22262>`_) and to exploit profile information
-  when available. Some lowering strategies are now disabled when optimizations
-  are turned off, to save compile time.
+   Special New Feature
+   -------------------
 
-* The debug info IR class hierarchy now inherits from ``Metadata`` and has its
-  own bitcode records and assembly syntax
-  (`documented in LangRef <LangRef.html#specialized-metadata-nodes>`_).  The debug
-  info verifier has been merged with the main verifier.
+   Makes programs 10x faster by doing Special New Thing.
 
-* LLVM IR and APIs are in a period of transition to aid in the removal of
-  pointer types (the end goal being that pointers are typeless/opaque - void*,
-  if you will). Some APIs and IR constructs have been modified to take
-  explicit types that are currently checked to match the target type of their
-  pre-existing pointer type operands. Further changes are still needed, but the
-  more you can avoid using ``PointerType::getPointeeType``, the easier the
-  migration will be.
+Changes to the ARM Backend
+--------------------------
 
-* Argument-less ``TargetMachine::getSubtarget`` and
-  ``TargetMachine::getSubtargetImpl`` have been removed from the tree. Updating
-  out of tree ports is as simple as implementing a non-virtual version in the
-  target, but implementing full ``Function`` based ``TargetSubtargetInfo``
-  support is recommended.
+ During this release ...
 
-* This is expected to be the last major release of LLVM that supports being
-  run on Windows XP and Windows Vista.  For the next major release the minimum
-  Windows version requirement will be Windows 7.
 
 Changes to the MIPS Target
 --------------------------
 
-During this release the MIPS target has:
-
-* Added support for MIPS32R3, MIPS32R5, MIPS32R3, MIPS32R5, and microMIPS32.
-
-* Added support for dynamic stack realignment. This is of particular importance
-  to MSA on 32-bit subtargets since vectors always exceed the stack alignment on
-  the O32 ABI.
-
-* Added support for compiler-rt including:
-
-  * Support for the Address, and Undefined Behaviour Sanitizers for all MIPS
-    subtargets.
+ During this release ...
 
-  * Support for the Data Flow, and Memory Sanitizer for 64-bit subtargets.
-
-  * Support for the Profiler for all MIPS subtargets.
-
-* Added support for libcxx, and libcxxabi.
-
-* Improved inline assembly support such that memory constraints may now make use
-  of the appropriate address offsets available to the instructions. Also, added
-  support for the ``ZC`` constraint.
-
-* Added support for 128-bit integers on 64-bit subtargets and 16-bit floating
-  point conversions on all subtargets.
-
-* Added support for read-only ``.eh_frame`` sections by storing type information
-  indirectly.
-
-* Added support for MCJIT on all 64-bit subtargets as well as MIPS32R6.
-
-* Added support for fast instruction selection on MIPS32 and MIPS32R2 with PIC.
-
-* Various bug fixes. Including the following notable fixes:
-
-  * Fixed 'jumpy' debug line info around calls where calculation of the address
-    of the function would inappropriately change the line number.
-
-  * Fixed missing ``__mips_isa_rev`` macro on the MIPS32R6 and MIPS32R6
-    subtargets.
-
-  * Fixed representation of NaN when targeting systems using traditional
-    encodings. Traditionally, MIPS has used NaN encodings that were compatible
-    with IEEE754-1985 but would later be found incompatible with IEEE754-2008.
-
-  * Fixed multiple segfaults and assertions in the disassembler when
-    disassembling instructions that have memory operands.
-
-  * Fixed multiple cases of suboptimal code generation involving $zero.
-
-  * Fixed code generation of 128-bit shifts on 64-bit subtargets.
-
-  * Prevented the delay slot filler from filling call delay slots with
-    instructions that modify or use $ra.
-
-  * Fixed some remaining N32/N64 calling convention bugs when using small
-    structures on big-endian subtargets.
-
-  * Fixed missing sign-extensions that are required by the N32/N64 calling
-    convention when generating calls to library functions with 32-bit
-    parameters.
-
-  * Corrected the ``int64_t`` typedef to be ``long`` for N64.
-
-  * ``-mno-odd-spreg`` is now honoured for vector insertion/extraction
-    operations when using -mmsa.
-
-  * Fixed vector insertion and extraction for MSA on 64-bit subtargets.
-
-  * Corrected the representation of member function pointers. This makes them
-    usable on microMIPS subtargets.
 
 Changes to the PowerPC Target
 -----------------------------
 
-There are numerous improvements to the PowerPC target in this release:
-
-* LLVM now supports the ISA 2.07B (POWER8) instruction set, including
-  direct moves between general registers and vector registers, and
-  built-in support for hardware transactional memory (HTM).  Some missing
-  instructions from ISA 2.06 (POWER7) were also added.
-
-* Code generation for the local-dynamic and global-dynamic thread-local
-  storage models has been improved.
-
-* Loops may be restructured to leverage pre-increment loads and stores.
-
-* QPX - The vector instruction set used by the IBM Blue Gene/Q supercomputers
-  is now supported.
-
-* Loads from the TOC area are now correctly treated as invariant.
+ During this release ...
 
-* PowerPC now has support for i128 and v1i128 types.  The types differ
-  in how they are passed in registers for the ELFv2 ABI.
 
-* Disassembly will now print shorter mnemonic aliases when available.
-
-* Optional register name prefixes for VSX and QPX registers are now
-  supported in the assembly parser.
-
-* The back end now contains a pass to remove unnecessary vector swaps
-  from POWER8 little-endian code generation.  Additional improvements
-  are planned for release 3.8.
-
-* The undefined-behavior sanitizer (UBSan) is now supported for PowerPC.
-
-* Many new vector programming APIs have been added to altivec.h.
-  Additional ones are planned for release 3.8.
-
-* PowerPC now supports __builtin_call_with_static_chain.
-
-* PowerPC now supports the revised -mrecip option that permits finer
-  control over reciprocal estimates.
-
-* Many bugs have been identified and fixed.
-
-Changes to the SystemZ Target
+Changes to the X86 Target
 -----------------------------
 
-* LLVM no longer attempts to automatically detect the current host CPU when
-  invoked natively.
-
-* Support for all thread-local storage models. (Previous releases would support
-  only the local-exec TLS model.)
-
-* The POPCNT instruction is now used on z196 and above.
-
-* The RISBGN instruction is now used on zEC12 and above.
-
-* Support for the transactional-execution facility on zEC12 and above.
-
-* Support for the z13 processor and its vector facility.
-
-
-Changes to the JIT APIs
------------------------
-
-* Added a new C++ JIT API called On Request Compilation, or ORC.
-
-  ORC is a new JIT API inspired by MCJIT but designed to be more testable, and
-  easier to extend with new features. A key new feature already in tree is lazy,
-  function-at-a-time compilation for X86. Also included is a reimplementation of
-  MCJIT's API and behavior (OrcMCJITReplacement). MCJIT itself remains in tree,
-  and continues to be the default JIT ExecutionEngine, though new users are
-  encouraged to try ORC out for their projects. (A good place to start is the
-  new ORC tutorials under llvm/examples/kaleidoscope/orc).
-
-Sub-project Status Update
-=========================
-
-In addition to the core LLVM 3.7 distribution of production-quality compiler
-infrastructure, the LLVM project includes sub-projects that use the LLVM core
-and share the same distribution license. This section provides updates on these
-sub-projects.
-
-Polly - The Polyhedral Loop Optimizer in LLVM
----------------------------------------------
-
-`Polly <http://polly.llvm.org>`_ is a polyhedral loop optimization
-infrastructure that provides data-locality optimizations to LLVM-based
-compilers. When compiled as part of clang or loaded as a module into clang,
-it can perform loop optimizations such as tiling, loop fusion or outer-loop
-vectorization. As a generic loop optimization infrastructure it allows
-developers to get a per-loop-iteration model of a loop nest on which detailed
-analysis and transformations can be performed.
-
-Changes since the last release:
-
-* isl imported into Polly distribution
-
-  `isl <http://repo.or.cz/w/isl.git>`_, the math library Polly uses, has been
-  imported into the source code repository of Polly and is now distributed as part
-  of Polly. As this was the last external library dependency of Polly, Polly can
-  now be compiled right after checking out the Polly source code without the need
-  for any additional libraries to be pre-installed.
-
-* Small integer optimization of isl
-
-  The MIT licensed imath backend using in `isl <http://repo.or.cz/w/isl.git>`_ for
-  arbitrary width integer computations has been optimized to use native integer
-  operations for the common case where the operands of a computation fit into 32
-  bit and to only fall back to large arbitrary precision integers for the
-  remaining cases. This optimization has greatly improved the compile-time
-  performance of Polly, both due to faster native operations also due to a
-  reduction in malloc traffic and pointer indirections. As a result, computations
-  that use arbitrary precision integers heavily have been speed up by almost 6x.
-  As a result, the compile-time of Polly on the Polybench test kernels in the LNT
-  suite has been reduced by 20% on average with compile time reductions between
-  9-43%.
+ During this release ...
 
-* Schedule Trees
+* TLS is enabled for Cygwin as emutls.
 
-  Polly now uses internally so-called > Schedule Trees < to model the loop
-  structure it optimizes. Schedule trees are an easy to understand tree structure
-  that describes a loop nest using integer constraint sets to keep track of
-  execution constraints. It allows the developer to use per-tree-node operations
-  to modify the loop tree. Programatic analysis that work on the schedule tree
-  (e.g., as dependence analysis) also show a visible speedup as they can exploit
-  the tree structure of the schedule and need to fall back to ILP based
-  optimization problems less often. Section 6 of `Polyhedral AST generation is
-  more than scanning polyhedra
-  <http://www.grosser.es/#pub-polyhedral-AST-generation>`_ gives a detailed
-  explanation of this schedule trees.
 
-* Scalar and PHI node modeling - Polly as an analysis
-
-  Polly now requires almost no preprocessing to analyse LLVM-IR, which makes it
-  easier to use Polly as a pure analysis pass e.g. to provide more precise
-  dependence information to non-polyhedral transformation passes. Originally,
-  Polly required the input LLVM-IR to be preprocessed such that all scalar and
-  PHI-node dependences are translated to in-memory operations. Since this release,
-  Polly has full support for scalar and PHI node dependences and requires no
-  scalar-to-memory translation for such kind of dependences.
-
-* Modeling of modulo and non-affine conditions
-
-  Polly can now supports modulo operations such as A[t%2][i][j] as they appear
-  often in stencil computations and also allows data-dependent conditional
-  branches as they result e.g. from ternary conditions ala A[i] > 255 ? 255 :
-  A[i].
-
-* Delinearization
-
-  Polly now support the analysis of manually linearized multi-dimensional arrays
-  as they result form macros such as
-  "#define 2DARRAY(A,i,j) (A.data[(i) * A.size + (j)]". Similar constructs appear
-  in old C code written before C99, C++ code such as boost::ublas, LLVM exported
-  from Julia, Matlab generated code and many others. Our work titled
-  `Optimistic Delinearization of Parametrically Sized Arrays
-  <http://www.grosser.es/#pub-optimistic-delinerization>`_ gives details.
-
-* Compile time improvements
-
-  Pratik Bahtu worked on compile-time performance tuning of Polly. His work
-  together with the support for schedule trees and the small integer optimization
-  in isl notably reduced the compile time.
-
-* Increased compute timeouts
-
-  As Polly's compile time has been notabily improved, we were able to increase
-  the compile time saveguards in Polly. As a result, the default configuration
-  of Polly can now analyze larger loop nests without running into compile time
-  restrictions.
-
-* Export Debug Locations via JSCoP file
+Changes to the OCaml bindings
+-----------------------------
 
-  Polly's JSCoP import/export format gained support for debug locations that show
-  to the user the source code location of detected scops.
-
-* Improved windows support
-
-  The compilation of Polly on windows using cmake has been improved and several
-  visual studio build issues have been addressed.
+ During this release ...
 
-* Many bug fixes
-
-libunwind
----------
-
-The unwind implementation which use to reside in `libc++abi` has been moved into
-a separate repository.  This implementation can still be used for `libc++abi` by
-specifying `-DLIBCXXABI_USE_LLVM_UNWINDER=YES` and
-`-DLIBCXXABI_LIBUNWIND_PATH=<path to libunwind source>` when configuring
-`libc++abi`, which defaults to `true` when building on ARM.
+* The ocaml function link_modules has been replaced with link_modules' which
+  uses LLVMLinkModules2.
 
-The new repository can also be built standalone if just `libunwind` is desired.
 
-External Open Source Projects Using LLVM 3.7
+External Open Source Projects Using LLVM 3.8
 ============================================
 
 An exciting aspect of LLVM is that it is used as an enabling technology for
 a lot of other language and tools projects. This section lists some of the
-projects that have already been updated to work with LLVM 3.7.
-
-
-LDC - the LLVM-based D compiler
--------------------------------
-
-`D <http://dlang.org>`_ is a language with C-like syntax and static typing. It
-pragmatically combines efficiency, control, and modeling power, with safety and
-programmer productivity. D supports powerful concepts like Compile-Time Function
-Execution (CTFE) and Template Meta-Programming, provides an innovative approach
-to concurrency and offers many classical paradigms.
-
-`LDC <http://wiki.dlang.org/LDC>`_ uses the frontend from the reference compiler
-combined with LLVM as backend to produce efficient native code. LDC targets
-x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux on
-PowerPC (32/64 bit). Ports to other architectures like ARM, AArch64 and MIPS64
-are underway.
-
-Portable Computing Language (pocl)
-----------------------------------
+projects that have already been updated to work with LLVM 3.8.
 
-In addition to producing an easily portable open source OpenCL
-implementation, another major goal of `pocl <http://portablecl.org/>`_
-is improving performance portability of OpenCL programs with
-compiler optimizations, reducing the need for target-dependent manual
-optimizations. An important part of pocl is a set of LLVM passes used to
-statically parallelize multiple work-items with the kernel compiler, even in
-the presence of work-group barriers.
-
-
-TTA-based Co-design Environment (TCE)
--------------------------------------
-
-`TCE <http://tce.cs.tut.fi/>`_ is a toolset for designing customized
-exposed datapath processors based on the Transport triggered
-architecture (TTA).
-
-The toolset provides a complete co-design flow from C/C++
-programs down to synthesizable VHDL/Verilog and parallel program binaries.
-Processor customization points include the register files, function units,
-supported operations, and the interconnection network.
-
-TCE uses Clang and LLVM for C/C++/OpenCL C language support, target independent
-optimizations and also for parts of code generation. It generates
-new LLVM-based code generators "on the fly" for the designed processors and
-loads them in to the compiler backend as runtime libraries to avoid
-per-target recompilation of larger parts of the compiler chain.
-
-BPF Compiler Collection (BCC)
------------------------------
-`BCC <https://github.com/iovisor/bcc>`_ is a Python + C framework for tracing and
-networking that is using Clang rewriter + 2nd pass of Clang + BPF backend to
-generate eBPF and push it into the kernel.
-
-LLVMSharp & ClangSharp
-----------------------
-
-`LLVMSharp <http://www.llvmsharp.org>`_ and
-`ClangSharp <http://www.clangsharp.org>`_ are type-safe C# bindings for
-Microsoft.NET and Mono that Platform Invoke into the native libraries.
-ClangSharp is self-hosted and is used to generated LLVMSharp using the
-LLVM-C API.
-
-`LLVMSharp Kaleidoscope Tutorials <http://www.llvmsharp.org/Kaleidoscope/>`_
-are instructive examples of writing a compiler in C#, with certain improvements
-like using the visitor pattern to generate LLVM IR.
-
-`ClangSharp PInvoke Generator <http://www.clangsharp.org/PInvoke/>`_ is the
-self-hosting mechanism for LLVM/ClangSharp and is demonstrative of using
-LibClang to generate Platform Invoke (PInvoke) signatures for C APIs.
+* A project
 
 
 Additional Information
@@ -500,3 +144,4 @@ going into the ``llvm/docs/`` directory in the LLVM tree.
 
 If you have any questions or comments about LLVM, please feel free to contact
 us via the `mailing lists <http://llvm.org/docs/#maillist>`_.
+
diff --git a/docs/ReleaseProcess.rst b/docs/ReleaseProcess.rst
index c4bbc91c63ce..d7f703126019 100644
--- a/docs/ReleaseProcess.rst
+++ b/docs/ReleaseProcess.rst
@@ -53,7 +53,7 @@ test-release.sh
 ---------------
 
 This script will check-out, configure and compile LLVM+Clang (+ most add-ons, like ``compiler-rt``,
-``libcxx`` and ``clang-extra-tools``) in three stages, and will test the final stage.
+``libcxx``, ``libomp`` and ``clang-extra-tools``) in three stages, and will test the final stage.
 It'll have installed the final binaries on the Phase3/Releasei(+Asserts) directory, and
 that's the one you should use for the test-suite and other external tests.
 
diff --git a/docs/SourceLevelDebugging.rst b/docs/SourceLevelDebugging.rst
index 99186f581881..270c44eb50ba 100644
--- a/docs/SourceLevelDebugging.rst
+++ b/docs/SourceLevelDebugging.rst
@@ -231,7 +231,7 @@ Compiled to LLVM, this function would be represented like this:
 .. code-block:: llvm
 
   ; Function Attrs: nounwind ssp uwtable
-  define void @foo() #0 {
+  define void @foo() #0 !dbg !4 {
   entry:
     %X = alloca i32, align 4
     %Y = alloca i32, align 4
@@ -263,20 +263,20 @@ Compiled to LLVM, this function would be represented like this:
   !1 = !DIFile(filename: "/dev/stdin", directory: "/Users/dexonsmith/data/llvm/debug-info")
   !2 = !{}
   !3 = !{!4}
-  !4 = !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: false, function: void ()* @foo, variables: !2)
+  !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: false, variables: !2)
   !5 = !DISubroutineType(types: !6)
   !6 = !{null}
   !7 = !{i32 2, !"Dwarf Version", i32 2}
   !8 = !{i32 2, !"Debug Info Version", i32 3}
   !9 = !{i32 1, !"PIC Level", i32 2}
   !10 = !{!"clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)"}
-  !11 = !DILocalVariable(tag: DW_TAG_auto_variable, name: "X", scope: !4, file: !1, line: 2, type: !12)
+  !11 = !DILocalVariable(name: "X", scope: !4, file: !1, line: 2, type: !12)
   !12 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
   !13 = !DIExpression()
   !14 = !DILocation(line: 2, column: 9, scope: !4)
-  !15 = !DILocalVariable(tag: DW_TAG_auto_variable, name: "Y", scope: !4, file: !1, line: 3, type: !12)
+  !15 = !DILocalVariable(name: "Y", scope: !4, file: !1, line: 3, type: !12)
   !16 = !DILocation(line: 3, column: 9, scope: !4)
-  !17 = !DILocalVariable(tag: DW_TAG_auto_variable, name: "Z", scope: !18, file: !1, line: 5, type: !12)
+  !17 = !DILocalVariable(name: "Z", scope: !18, file: !1, line: 5, type: !12)
   !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
   !19 = !DILocation(line: 5, column: 11, scope: !18)
   !20 = !DILocation(line: 6, column: 11, scope: !18)
@@ -304,10 +304,9 @@ scope information for the variable ``X``.
 .. code-block:: llvm
 
   !14 = !DILocation(line: 2, column: 9, scope: !4)
-  !4 = !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5,
-                     isLocal: false, isDefinition: true, scopeLine: 1,
-                     isOptimized: false, function: void ()* @foo,
-                     variables: !2)
+  !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5,
+                              isLocal: false, isDefinition: true, scopeLine: 1,
+                              isOptimized: false, variables: !2)
 
 Here ``!14`` is metadata providing `location information
 <LangRef.html#dilocation>`_.  In this example, scope is encoded by ``!4``, a
@@ -368,15 +367,14 @@ C/C++ source file information
 
 ``llvm::Instruction`` provides easy access to metadata attached with an
 instruction.  One can extract line number information encoded in LLVM IR using
-``Instruction::getMetadata()`` and ``DILocation::getLineNumber()``.
+``Instruction::getDebugLoc()`` and ``DILocation::getLine()``.
 
 .. code-block:: c++
 
-  if (MDNode *N = I->getMetadata("dbg")) {  // Here I is an LLVM instruction
-    DILocation Loc(N);                      // DILocation is in DebugInfo.h
-    unsigned Line = Loc.getLineNumber();
-    StringRef File = Loc.getFilename();
-    StringRef Dir = Loc.getDirectory();
+  if (DILocation *Loc = I->getDebugLoc()) { // Here I is an LLVM instruction
+    unsigned Line = Loc->getLine();
+    StringRef File = Loc->getFilename();
+    StringRef Dir = Loc->getDirectory();
   }
 
 C/C++ global variable information
@@ -464,12 +462,12 @@ a C/C++ front-end would generate the following descriptors:
   !4 = !DISubprogram(name: "main", scope: !1, file: !1, line: 1, type: !5,
                      isLocal: false, isDefinition: true, scopeLine: 1,
                      flags: DIFlagPrototyped, isOptimized: false,
-                     function: i32 (i32, i8**)* @main, variables: !2)
+                     variables: !2)
 
   ;;
   ;; Define the subprogram itself.
   ;;
-  define i32 @main(i32 %argc, i8** %argv) {
+  define i32 @main(i32 %argc, i8** %argv) !dbg !4 {
   ...
   }
 
@@ -709,7 +707,7 @@ qualified name.  Debugger users tend not to enter their search strings as
 "``a::b::c``".  So the name entered in the name table must be demangled in
 order to chop it up appropriately and additional names must be manually entered
 into the table to make it effective as a name lookup table for debuggers to
-se.
+use.
 
 All debuggers currently ignore the "``.debug_pubnames``" table as a result of
 its inconsistent and useless public-only name content making it a waste of
diff --git a/docs/StackMaps.rst b/docs/StackMaps.rst
index dbdf78f992ca..5bdae38b699d 100644
--- a/docs/StackMaps.rst
+++ b/docs/StackMaps.rst
@@ -499,3 +499,13 @@ the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
 transformations must not substitute the alloca with any intervening
 value. This can be verified by the runtime simply by checking that the
 stack map's location is a Direct location type.
+
+
+Supported Architectures
+=======================
+
+Support for StackMap generation and the related intrinsics requires 
+some code for each backend.  Today, only a subset of LLVM's backends 
+are supported.  The currently supported architectures are X86_64, 
+PowerPC, and Aarch64.
+
diff --git a/docs/Statepoints.rst b/docs/Statepoints.rst
index eb5866eb552f..442b1c269c47 100644
--- a/docs/Statepoints.rst
+++ b/docs/Statepoints.rst
@@ -53,7 +53,7 @@ load barriers, store barriers, and safepoints.
    loads, merely loads of a particular type (in the original source
    language), or none at all.
 
-#. Analogously, a store barrier is a code fragement that runs
+#. Analogously, a store barrier is a code fragment that runs
    immediately before the machine store instruction, but after the
    computation of the value stored.  The most common use of a store
    barrier is to update a 'card table' in a generational garbage
@@ -142,8 +142,8 @@ resulting relocation sequence is:
 
   define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj) 
          gc "statepoint-example" {
-    %0 = call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
-    %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %0, i32 7, i32 7)
+    %0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
+    %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 7, i32 7)
     ret i8 addrspace(1)* %obj.relocated
   }
 
@@ -160,7 +160,7 @@ of the call, we use the ``gc.result`` intrinsic.  To get the relocation
 of each pointer in turn, we use the ``gc.relocate`` intrinsic with the
 appropriate index.  Note that both the ``gc.relocate`` and ``gc.result`` are
 tied to the statepoint.  The combination forms a "statepoint relocation 
-sequence" and represents the entitety of a parseable call or 'statepoint'.
+sequence" and represents the entirety of a parseable call or 'statepoint'.
 
 When lowered, this example would generate the following x86 assembly:
 
@@ -206,6 +206,52 @@ This example was taken from the tests for the :ref:`RewriteStatepointsForGC` uti
 
   opt -rewrite-statepoints-for-gc test/Transforms/RewriteStatepointsForGC/basics.ll -S | llc -debug-only=stackmaps
 
+Base & Derived Pointers
+^^^^^^^^^^^^^^^^^^^^^^^
+
+A "base pointer" is one which points to the starting address of an allocation
+(object).  A "derived pointer" is one which is offset from a base pointer by
+some amount.  When relocating objects, a garbage collector needs to be able 
+to relocate each derived pointer associated with an allocation to the same 
+offset from the new address.
+
+"Interior derived pointers" remain within the bounds of the allocation 
+they're associated with.  As a result, the base object can be found at 
+runtime provided the bounds of allocations are known to the runtime system.
+
+"Exterior derived pointers" are outside the bounds of the associated object;
+they may even fall within *another* allocations address range.  As a result,
+there is no way for a garbage collector to determine which allocation they 
+are associated with at runtime and compiler support is needed.
+
+The ``gc.relocate`` intrinsic supports an explicit operand for describing the
+allocation associated with a derived pointer.  This operand is frequently 
+referred to as the base operand, but does not strictly speaking have to be
+a base pointer, but it does need to lie within the bounds of the associated
+allocation.  Some collectors may require that the operand be an actual base
+pointer rather than merely an internal derived pointer. Note that during 
+lowering both the base and derived pointer operands are required to be live 
+over the associated call safepoint even if the base is otherwise unused 
+afterwards.
+
+If we extend our previous example to include a pointless derived pointer, 
+we get:
+
+.. code-block:: llvm
+
+  define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj) 
+         gc "statepoint-example" {
+    %gep = getelementptr i8, i8 addrspace(1)* %obj, i64 20000
+    %token = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj, i8 addrspace(1)* %gep)
+    %obj.relocated = call i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %token, i32 7, i32 7)
+    %gep.relocated = call i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %token, i32 7, i32 8)
+    %p = getelementptr i8, i8 addrspace(1)* %gep, i64 -20000
+    ret i8 addrspace(1)* %p
+  }
+
+Note that in this example %p and %obj.relocate are the same address and we
+could replace one with the other, potentially removing the derived pointer
+from the live set at the safepoint entirely.  
 
 GC Transitions
 ^^^^^^^^^^^^^^^^^^
@@ -225,7 +271,7 @@ statepoint.
   transitions based on the function symbols involved (e.g. a call from a
   function with GC strategy "foo" to a function with GC strategy "bar"),
   indirect calls that are also GC transitions must also be supported. This
-  requirement is the driving force behing the decision to require that GC
+  requirement is the driving force behind the decision to require that GC
   transitions are explicitly marked.
 
 Let's revisit the sample given above, this time treating the call to ``@foo``
@@ -242,8 +288,8 @@ to unmanaged code. The resulting relocation sequence is:
   define i8 addrspace(1)* @test1(i8 addrspace(1) *%obj)
          gc "hypothetical-gc" {
 
-    %0 = call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 1, i32* @Flag, i32 0, i8 addrspace(1)* %obj)
-    %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %0, i32 7, i32 7)
+    %0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 1, i32* @Flag, i32 0, i8 addrspace(1)* %obj)
+    %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 7, i32 7)
     ret i8 addrspace(1)* %obj.relocated
   }
 
@@ -296,7 +342,7 @@ Syntax:
 
 ::
 
-      declare i32
+      declare token
         @llvm.experimental.gc.statepoint(i64 <id>, i32 <num patch bytes>,
                        func_type <target>, 
                        i64 <#call args>, i64 <flags>,
@@ -331,14 +377,16 @@ the user will patch over the 'num patch bytes' bytes of nops with a
 calling sequence specific to their runtime before executing the
 generated machine code.  There are no guarantees with respect to the
 alignment of the nop sequence.  Unlike :doc:`StackMaps` statepoints do
-not have a concept of shadow bytes.
+not have a concept of shadow bytes.  Note that semantically the
+statepoint still represents a call or invoke to 'target', and the nop
+sequence after patching is expected to represent an operation
+equivalent to a call or invoke to 'target'.
 
 The 'target' operand is the function actually being called.  The
 target can be specified as either a symbolic LLVM function, or as an
 arbitrary Value of appropriate function type.  Note that the function
 type must match the signature of the callee and the types of the 'call
-parameters' arguments.  If 'num patch bytes' is non-zero then 'target'
-has to be the constant pointer null of the appropriate function type.
+parameters' arguments.
 
 The '#call args' operand is the number of arguments to the actual
 call.  It must exactly match the number of arguments passed in the
@@ -408,7 +456,7 @@ Syntax:
 ::
 
       declare type*
-        @llvm.experimental.gc.result(i32 %statepoint_token)
+        @llvm.experimental.gc.result(token %statepoint_token)
 
 Overview:
 """""""""
@@ -424,7 +472,7 @@ Operands:
 
 The first and only argument is the ``gc.statepoint`` which starts
 the safepoint sequence of which this ``gc.result`` is a part.
-Despite the typing of this as a generic i32, *only* the value defined
+Despite the typing of this as a generic token, *only* the value defined 
 by a ``gc.statepoint`` is legal here.
 
 Semantics:
@@ -448,7 +496,7 @@ Syntax:
 ::
 
       declare <pointer type>
-        @llvm.experimental.gc.relocate(i32 %statepoint_token, 
+        @llvm.experimental.gc.relocate(token %statepoint_token, 
                                        i32 %base_offset, 
                                        i32 %pointer_offset)
 
@@ -463,13 +511,18 @@ Operands:
 
 The first argument is the ``gc.statepoint`` which starts the
 safepoint sequence of which this ``gc.relocation`` is a part.
-Despite the typing of this as a generic i32, *only* the value defined
+Despite the typing of this as a generic token, *only* the value defined 
 by a ``gc.statepoint`` is legal here.
 
 The second argument is an index into the statepoints list of arguments
-which specifies the base pointer for the pointer being relocated.
+which specifies the allocation for the pointer being relocated.
 This index must land within the 'gc parameter' section of the
-statepoint's argument list.
+statepoint's argument list.  The associated value must be within the
+object with which the pointer being relocated is associated. The optimizer
+is free to change *which* interior derived pointer is reported, provided that
+it does not replace an actual base pointer with another interior derived 
+pointer.  Collectors are allowed to rely on the base pointer operand 
+remaining an actual base pointer if so constructed.
 
 The third argument is an index into the statepoint's list of arguments
 which specify the (potentially) derived pointer being relocated.  It
@@ -590,7 +643,7 @@ As an example, given this code:
 
   define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj) 
          gc "statepoint-example" {
-    call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0)
+    call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0)
     ret i8 addrspace(1)* %obj
   }
 
@@ -600,8 +653,8 @@ The pass would produce this IR:
 
   define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj) 
          gc "statepoint-example" {
-    %0 = call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
-    %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %0, i32 12, i32 12)
+    %0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
+    %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 12, i32 12)
     ret i8 addrspace(1)* %obj.relocated
   }
 
@@ -612,8 +665,18 @@ non references.  Address space 1 is not globally reserved for this purpose.
 This pass can be used an utility function by a language frontend that doesn't 
 want to manually reason about liveness, base pointers, or relocation when 
 constructing IR.  As currently implemented, RewriteStatepointsForGC must be 
-run after SSA construction (i.e. mem2ref).  
-
+run after SSA construction (i.e. mem2ref).
+
+RewriteStatepointsForGC will ensure that appropriate base pointers are listed
+for every relocation created.  It will do so by duplicating code as needed to
+propagate the base pointer associated with each pointer being relocated to
+the appropriate safepoints.  The implementation assumes that the following 
+IR constructs produce base pointers: loads from the heap, addresses of global 
+variables, function arguments, function return values. Constant pointers (such
+as null) are also assumed to be base pointers.  In practice, this constraint
+can be relaxed to producing interior derived pointers provided the target 
+collector can find the associated allocation from an arbitrary interior 
+derived pointer.
 
 In practice, RewriteStatepointsForGC can be run much later in the pass 
 pipeline, after most optimization is already done.  This helps to improve 
@@ -654,8 +717,8 @@ This pass would produce the following IR:
 .. code-block:: llvm
 
   define void @test() gc "statepoint-example" {
-    %safepoint_token = call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @do_safepoint, i32 0, i32 0, i32 0, i32 0)
-    %safepoint_token1 = call i32 (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0)
+    %safepoint_token = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @do_safepoint, i32 0, i32 0, i32 0, i32 0)
+    %safepoint_token1 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0)
     ret void
   }
 
@@ -699,6 +762,12 @@ deoptimization or introspection) at safepoints.  In that case, ask on the
 llvm-dev mailing list for suggestions.
 
 
+Supported Architectures
+=======================
+
+Support for statepoint generation requires some code for each backend.
+Today, only X86_64 is supported.  
+
 Bugs and Enhancements
 =====================
 
diff --git a/docs/TestingGuide.rst b/docs/TestingGuide.rst
index adf5f3d4cfbd..134ddd88c87d 100644
--- a/docs/TestingGuide.rst
+++ b/docs/TestingGuide.rst
@@ -240,6 +240,10 @@ The recommended way to examine output to figure out if the test passes is using
 the :doc:`FileCheck tool <CommandGuide/FileCheck>`. *[The usage of grep in RUN
 lines is deprecated - please do not send or commit patches that use it.]*
 
+Put related tests into a single file rather than having a separate file per
+test. Check if there are files already covering your feature and consider
+adding your code there instead of creating a new file.
+
 Extra files
 -----------
 
diff --git a/docs/WritingAnLLVMPass.rst b/docs/WritingAnLLVMPass.rst
index 1d5a52f21b3f..241066842b7b 100644
--- a/docs/WritingAnLLVMPass.rst
+++ b/docs/WritingAnLLVMPass.rst
@@ -47,14 +47,11 @@ source tree in the ``lib/Transforms/Hello`` directory.
 Setting up the build environment
 --------------------------------
 
-.. FIXME: Why does this recommend to build in-tree?
-
-First, configure and build LLVM.  This needs to be done directly inside the
-LLVM source tree rather than in a separate objects directory.  Next, you need
-to create a new directory somewhere in the LLVM source base.  For this example,
-we'll assume that you made ``lib/Transforms/Hello``.  Finally, you must set up
-a build script (``Makefile``) that will compile the source code for the new
-pass.  To do this, copy the following into ``Makefile``:
+First, configure and build LLVM.  Next, you need to create a new directory
+somewhere in the LLVM source base.  For this example, we'll assume that you
+made ``lib/Transforms/Hello``.  Finally, you must set up a build script
+(``Makefile``) that will compile the source code for the new pass.  To do this,
+copy the following into ``Makefile``:
 
 .. code-block:: make
 
@@ -206,9 +203,8 @@ As a whole, the ``.cpp`` file looks like:
     static RegisterPass<Hello> X("hello", "Hello World Pass", false, false);
 
 Now that it's all together, compile the file with a simple "``gmake``" command
-in the local directory and you should get a new file
-"``Debug+Asserts/lib/Hello.so``" under the top level directory of the LLVM
-source tree (not in the local directory).  Note that everything in this file is
+from the top level of your build directory and you should get a new file
+"``Debug+Asserts/lib/Hello.so``".  Note that everything in this file is
 contained in an anonymous namespace --- this reflects the fact that passes
 are self contained units that do not need external interfaces (although they
 can have them) to be useful.
@@ -228,7 +224,7 @@ will work):
 
 .. code-block:: console
 
-  $ opt -load ../../../Debug+Asserts/lib/Hello.so -hello < hello.bc > /dev/null
+  $ opt -load ../../Debug+Asserts/lib/Hello.so -hello < hello.bc > /dev/null
   Hello: __main
   Hello: puts
   Hello: main
@@ -245,7 +241,7 @@ To see what happened to the other string you registered, try running
 
 .. code-block:: console
 
-  $ opt -load ../../../Debug+Asserts/lib/Hello.so -help
+  $ opt -load ../../Debug+Asserts/lib/Hello.so -help
   OVERVIEW: llvm .bc -> .bc modular optimizer
 
   USAGE: opt [options] <input bitcode>
@@ -272,7 +268,7 @@ you queue up.  For example:
 
 .. code-block:: console
 
-  $ opt -load ../../../Debug+Asserts/lib/Hello.so -hello -time-passes < hello.bc > /dev/null
+  $ opt -load ../../Debug+Asserts/lib/Hello.so -hello -time-passes < hello.bc > /dev/null
   Hello: __main
   Hello: puts
   Hello: main
@@ -1092,7 +1088,7 @@ passes.  Lets try it out with the gcse and licm passes:
 
 .. code-block:: console
 
-  $ opt -load ../../../Debug+Asserts/lib/Hello.so -gcse -licm --debug-pass=Structure < hello.bc > /dev/null
+  $ opt -load ../../Debug+Asserts/lib/Hello.so -gcse -licm --debug-pass=Structure < hello.bc > /dev/null
   Module Pass Manager
     Function Pass Manager
       Dominator Set Construction
@@ -1129,7 +1125,7 @@ Lets see how this changes when we run the :ref:`Hello World
 
 .. code-block:: console
 
-  $ opt -load ../../../Debug+Asserts/lib/Hello.so -gcse -hello -licm --debug-pass=Structure < hello.bc > /dev/null
+  $ opt -load ../../Debug+Asserts/lib/Hello.so -gcse -hello -licm --debug-pass=Structure < hello.bc > /dev/null
   Module Pass Manager
     Function Pass Manager
       Dominator Set Construction
@@ -1170,7 +1166,7 @@ Now when we run our pass, we get this output:
 
 .. code-block:: console
 
-  $ opt -load ../../../Debug+Asserts/lib/Hello.so -gcse -hello -licm --debug-pass=Structure < hello.bc > /dev/null
+  $ opt -load ../../Debug+Asserts/lib/Hello.so -gcse -hello -licm --debug-pass=Structure < hello.bc > /dev/null
   Pass Arguments:  -gcse -hello -licm
   Module Pass Manager
     Function Pass Manager
diff --git a/docs/_ocamldoc/style.css b/docs/_ocamldoc/style.css
new file mode 100644
index 000000000000..00595d7f2f29
--- /dev/null
+++ b/docs/_ocamldoc/style.css
@@ -0,0 +1,97 @@
+/* A style for ocamldoc. Daniel C. Buenzli */
+
+/* Reset a few things. */
+html,body,div,span,applet,object,iframe,h1,h2,h3,h4,h5,h6,p,blockquote,pre,
+a,abbr,acronym,address,big,cite,code,del,dfn,em,font,img,ins,kbd,q,s,samp,
+small,strike,strong,sub,sup,tt,var,b,u,i,center,dl,dt,dd,ol,ul,li,fieldset,
+form,label,legend,table,caption,tbody,tfoot,thead,tr,th,td
+{ margin: 0; padding: 0; border: 0 none; outline: 0; font-size: 100%;
+  font-weight: inherit; font-style:inherit; font-family:inherit;
+  line-height: inherit; vertical-align: baseline; text-align:inherit;
+  color:inherit; background: transparent; }
+
+table { border-collapse: collapse; border-spacing: 0; }
+
+/* Basic page layout */
+
+body { font: normal 10pt/1.375em helvetica, arial, sans-serif; text-align:left;
+       margin: 1.375em 10%; min-width: 40ex; max-width: 72ex;
+       color: black; background: transparent /* url(line-height-22.gif) */; }
+
+b { font-weight: bold }
+em { font-style: italic }
+
+tt, code, pre { font-family: WorkAroundWebKitAndMozilla, monospace;
+                font-size: 1em; }
+pre code { font-size : inherit; }
+.codepre { margin-bottom:1.375em /* after code example we introduce space. */ }
+
+.superscript,.subscript
+{ font-size : 0.813em; line-height:0; margin-left:0.4ex;}
+.superscript { vertical-align: super; }
+.subscript { vertical-align: sub; }
+
+/* ocamldoc markup workaround hacks */
+
+
+
+hr, hr + br, div + br, center + br, span + br, ul + br, ol + br, pre + br
+{ display: none } /* annoying */
+
+div.info + br { display:block}
+
+.codepre br + br { display: none }
+h1 + pre { margin-bottom:1.375em} /* Toplevel module description */
+
+/* Sections and document divisions */
+
+/* .navbar { margin-bottom: -1.375em } */
+h1 { font-weight: bold; font-size: 1.5em; /* margin-top:1.833em; */
+     margin-top:0.917em; padding-top:0.875em;
+     border-top-style:solid; border-width:1px; border-color:#AAA; }
+h2 { font-weight: bold; font-size: 1.313em; margin-top: 1.048em }
+h3 { font-weight: bold; font-size: 1.125em; margin-top: 1.222em }
+h3 { font-weight: bold; font-size: 1em; margin-top: 1.375em}
+h4 { font-style: italic;  }
+
+/* Used by OCaml's own library documentation. */
+ h6 { font-weight: bold; font-size: 1.125em; margin-top: 1.222em }
+ .h7 { font-weight: bold; font-size: 1em; margin-top: 1.375em }
+
+p { margin-top: 1.375em }
+pre { margin-top: 1.375em }
+.info { margin: 0.458em 0em -0.458em 2em;}/* Description of types values etc. */
+td .info { margin:0; padding:0; margin-left: 2em;} /* Description in indexes */
+
+ul, ol { margin-top:0.688em; padding-bottom:0.687em;
+   list-style-position:outside}
+ul + p, ol + p { margin-top: 0em }
+ul { list-style-type: square }
+
+
+/* h2 + ul, h3 + ul, p + ul { } */
+ul > li { margin-left: 1.375em; }
+ol > li { margin-left: 1.7em; }
+/* Links */
+
+a, a:link, a:visited, a:active, a:hover { color : #00B; text-decoration: none }
+a:hover { text-decoration : underline }
+*:target {background-color: #FFFF99;} /* anchor highlight */
+
+/* Code */
+
+.keyword { font-weight: bold; }
+.comment { color : red }
+.constructor { color : green }
+.string { color : brown }
+.warning { color : red ; font-weight : bold }
+
+/* Functors */
+
+.paramstable { border-style : hidden ; padding-bottom:1.375em}
+.paramstable code { margin-left: 1ex; margin-right: 1ex }
+.sig_block {margin-left: 1em}
+
+/* Images */
+
+img { margin-top: 1.375em }
diff --git a/docs/conf.py b/docs/conf.py
index 27919c20a7a5..6e3f16ceef1a 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -48,9 +48,9 @@ copyright = u'2003-%d, LLVM Project' % date.today().year
 # built documents.
 #
 # The short X.Y version.
-version = '3.7'
+version = '3.8'
 # The full version, including alpha/beta/rc tags.
-release = '3.7'
+release = '3.8'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
diff --git a/docs/doxygen.cfg.in b/docs/doxygen.cfg.in
index 5c70db0332d5..5a74cecc8aac 100644
--- a/docs/doxygen.cfg.in
+++ b/docs/doxygen.cfg.in
@@ -1409,7 +1409,7 @@ FORMULA_TRANSPARENT    = YES
 # The default value is: NO.
 # This tag requires that the tag GENERATE_HTML is set to YES.
 
-USE_MATHJAX            = NO
+USE_MATHJAX            = YES
 
 # When MathJax is enabled you can set the default output format to be used for
 # the MathJax output. See the MathJax site (see:
diff --git a/docs/index.rst b/docs/index.rst
index 66c55758c4db..a69ecfedc580 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,6 +1,11 @@
 Overview
 ========
 
+.. warning::
+
+   If you are using a released version of LLVM, see `the download page
+   <http://llvm.org/releases/>`_ to find your documentation.
+
 The LLVM compiler infrastructure supports a wide range of projects, from
 industrial strength compilers to specialized JIT applications to small
 research projects.
@@ -81,6 +86,7 @@ representation.
    GetElementPtr
    Frontend/PerformanceTips
    MCJITDesignAndImplementation
+   CompileCudaWithLLVM
 
 :doc:`GettingStarted`
    Discusses how to get up and running quickly with the LLVM infrastructure.
@@ -256,6 +262,7 @@ For API clients and LLVM developers.
    MergeFunctions
    BitSets
    FaultMaps
+   MIRLangRef
 
 :doc:`WritingAnLLVMPass`
    Information on how to write LLVM transformations and analyses.
@@ -268,6 +275,10 @@ For API clients and LLVM developers.
    working on retargetting LLVM to a new architecture, designing a new codegen
    pass, or enhancing existing components.
 
+:doc:`Machine IR (MIR) Format Reference Manual <MIRLangRef>`
+   A reference manual for the MIR serialization format, which is used to test
+   LLVM's code generation passes.
+
 :doc:`TableGen <TableGen/index>`
    Describes the TableGen tool, which is used heavily by the LLVM code
    generator.
@@ -361,6 +372,9 @@ For API clients and LLVM developers.
 :doc:`FaultMaps`
   LLVM support for folding control flow into faulting machine instructions.
 
+:doc:`CompileCudaWithLLVM`
+  LLVM support for CUDA.
+
 Development Process Documentation
 =================================
 
diff --git a/docs/tutorial/LangImpl1.rst b/docs/tutorial/LangImpl1.rst
index f4b019166af3..b04cde10274e 100644
--- a/docs/tutorial/LangImpl1.rst
+++ b/docs/tutorial/LangImpl1.rst
@@ -25,7 +25,7 @@ It is useful to point out ahead of time that this tutorial is really
 about teaching compiler techniques and LLVM specifically, *not* about
 teaching modern and sane software engineering principles. In practice,
 this means that we'll take a number of shortcuts to simplify the
-exposition. For example, the code leaks memory, uses global variables
+exposition. For example, the code uses global variables
 all over the place, doesn't use nice design patterns like
 `visitors <http://en.wikipedia.org/wiki/Visitor_pattern>`_, etc... but
 it is very simple. If you dig in and use the code as a basis for future
@@ -146,7 +146,7 @@ useful for mutually recursive functions). For example:
 
 A more interesting example is included in Chapter 6 where we write a
 little Kaleidoscope application that `displays a Mandelbrot
-Set <LangImpl6.html#example>`_ at various levels of magnification.
+Set <LangImpl6.html#kicking-the-tires>`_ at various levels of magnification.
 
 Lets dive into the implementation of this language!
 
@@ -169,14 +169,16 @@ numeric value of a number). First, we define the possibilities:
       tok_eof = -1,
 
       // commands
-      tok_def = -2, tok_extern = -3,
+      tok_def = -2,
+      tok_extern = -3,
 
       // primary
-      tok_identifier = -4, tok_number = -5,
+      tok_identifier = -4,
+      tok_number = -5,
     };
 
-    static std::string IdentifierStr;  // Filled in if tok_identifier
-    static double NumVal;              // Filled in if tok_number
+    static std::string IdentifierStr; // Filled in if tok_identifier
+    static double NumVal;             // Filled in if tok_number
 
 Each token returned by our lexer will either be one of the Token enum
 values or it will be an 'unknown' character like '+', which is returned
@@ -217,8 +219,10 @@ loop:
         while (isalnum((LastChar = getchar())))
           IdentifierStr += LastChar;
 
-        if (IdentifierStr == "def") return tok_def;
-        if (IdentifierStr == "extern") return tok_extern;
+        if (IdentifierStr == "def")
+          return tok_def;
+        if (IdentifierStr == "extern")
+          return tok_extern;
         return tok_identifier;
       }
 
@@ -250,7 +254,8 @@ extend it :). Next we handle comments:
 
       if (LastChar == '#') {
         // Comment until end of line.
-        do LastChar = getchar();
+        do
+          LastChar = getchar();
         while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
 
         if (LastChar != EOF)
@@ -275,7 +280,7 @@ file. These are handled with this code:
     }
 
 With this, we have the complete lexer for the basic Kaleidoscope
-language (the `full code listing <LangImpl2.html#code>`_ for the Lexer
+language (the `full code listing <LangImpl2.html#full-code-listing>`_ for the Lexer
 is available in the `next chapter <LangImpl2.html>`_ of the tutorial).
 Next we'll `build a simple parser that uses this to build an Abstract
 Syntax Tree <LangImpl2.html>`_. When we have that, we'll include a
diff --git a/docs/tutorial/LangImpl2.rst b/docs/tutorial/LangImpl2.rst
index 06b18ff6c239..dab60172b988 100644
--- a/docs/tutorial/LangImpl2.rst
+++ b/docs/tutorial/LangImpl2.rst
@@ -44,8 +44,9 @@ We'll start with expressions first:
     /// NumberExprAST - Expression class for numeric literals like "1.0".
     class NumberExprAST : public ExprAST {
       double Val;
+
     public:
-      NumberExprAST(double val) : Val(val) {}
+      NumberExprAST(double Val) : Val(Val) {}
     };
 
 The code above shows the definition of the base ExprAST class and one
@@ -65,26 +66,31 @@ language:
     /// VariableExprAST - Expression class for referencing a variable, like "a".
     class VariableExprAST : public ExprAST {
       std::string Name;
+
     public:
-      VariableExprAST(const std::string &name) : Name(name) {}
+      VariableExprAST(const std::string &Name) : Name(Name) {}
     };
 
     /// BinaryExprAST - Expression class for a binary operator.
     class BinaryExprAST : public ExprAST {
       char Op;
-      ExprAST *LHS, *RHS;
+      std::unique_ptr<ExprAST> LHS, RHS;
+
     public:
-      BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
-        : Op(op), LHS(lhs), RHS(rhs) {}
+      BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,
+                    std::unique_ptr<ExprAST> RHS)
+        : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
     };
 
     /// CallExprAST - Expression class for function calls.
     class CallExprAST : public ExprAST {
       std::string Callee;
-      std::vector<ExprAST*> Args;
+      std::vector<std::unique_ptr<ExprAST>> Args;
+
     public:
-      CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
-        : Callee(callee), Args(args) {}
+      CallExprAST(const std::string &Callee,
+                  std::vector<std::unique_ptr<ExprAST>> Args)
+        : Callee(Callee), Args(std::move(Args)) {}
     };
 
 This is all (intentionally) rather straight-forward: variables capture
@@ -109,18 +115,21 @@ way to talk about functions themselves:
     class PrototypeAST {
       std::string Name;
       std::vector<std::string> Args;
+
     public:
-      PrototypeAST(const std::string &name, const std::vector<std::string> &args)
-        : Name(name), Args(args) {}
+      PrototypeAST(const std::string &name, std::vector<std::string> Args)
+        : Name(name), Args(std::move(Args)) {}
     };
 
     /// FunctionAST - This class represents a function definition itself.
     class FunctionAST {
-      PrototypeAST *Proto;
-      ExprAST *Body;
+      std::unique_ptr<PrototypeAST> Proto;
+      std::unique_ptr<ExprAST> Body;
+
     public:
-      FunctionAST(PrototypeAST *proto, ExprAST *body)
-        : Proto(proto), Body(body) {}
+      FunctionAST(std::unique_ptr<PrototypeAST> Proto,
+                  std::unique_ptr<ExprAST> Body)
+        : Proto(std::move(Proto)), Body(std::move(Body)) {}
     };
 
 In Kaleidoscope, functions are typed with just a count of their
@@ -142,9 +151,10 @@ be generated with calls like this:
 
 .. code-block:: c++
 
-      ExprAST *X = new VariableExprAST("x");
-      ExprAST *Y = new VariableExprAST("y");
-      ExprAST *Result = new BinaryExprAST('+', X, Y);
+      auto LHS = llvm::make_unique<VariableExprAST>("x");
+      auto RHS = llvm::make_unique<VariableExprAST>("y");
+      auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),
+                                                    std::move(RHS));
 
 In order to do this, we'll start by defining some basic helper routines:
 
@@ -167,9 +177,14 @@ be parsed.
 
 
     /// Error* - These are little helper functions for error handling.
-    ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
-    PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
-    FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
+    std::unique_ptr<ExprAST> Error(const char *Str) {
+      fprintf(stderr, "Error: %s\n", Str);
+      return nullptr;
+    }
+    std::unique_ptr<PrototypeAST> ErrorP(const char *Str) {
+      Error(Str);
+      return nullptr;
+    }
 
 The ``Error`` routines are simple helper routines that our parser will
 use to handle errors. The error recovery in our parser will not be the
@@ -190,10 +205,10 @@ which parses that production. For numeric literals, we have:
 .. code-block:: c++
 
     /// numberexpr ::= number
-    static ExprAST *ParseNumberExpr() {
-      ExprAST *Result = new NumberExprAST(NumVal);
+    static std::unique_ptr<ExprAST> ParseNumberExpr() {
+      auto Result = llvm::make_unique<NumberExprAST>(NumVal);
       getNextToken(); // consume the number
-      return Result;
+      return std::move(Result);
     }
 
 This routine is very simple: it expects to be called when the current
@@ -211,14 +226,15 @@ the parenthesis operator is defined like this:
 .. code-block:: c++
 
     /// parenexpr ::= '(' expression ')'
-    static ExprAST *ParseParenExpr() {
-      getNextToken();  // eat (.
-      ExprAST *V = ParseExpression();
-      if (!V) return 0;
+    static std::unique_ptr<ExprAST> ParseParenExpr() {
+      getNextToken(); // eat (.
+      auto V = ParseExpression();
+      if (!V)
+        return nullptr;
 
       if (CurTok != ')')
         return Error("expected ')'");
-      getNextToken();  // eat ).
+      getNextToken(); // eat ).
       return V;
     }
 
@@ -250,24 +266,26 @@ function calls:
     /// identifierexpr
     ///   ::= identifier
     ///   ::= identifier '(' expression* ')'
-    static ExprAST *ParseIdentifierExpr() {
+    static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
       std::string IdName = IdentifierStr;
 
       getNextToken();  // eat identifier.
 
       if (CurTok != '(') // Simple variable ref.
-        return new VariableExprAST(IdName);
+        return llvm::make_unique<VariableExprAST>(IdName);
 
       // Call.
       getNextToken();  // eat (
-      std::vector<ExprAST*> Args;
+      std::vector<std::unique_ptr<ExprAST>> Args;
       if (CurTok != ')') {
         while (1) {
-          ExprAST *Arg = ParseExpression();
-          if (!Arg) return 0;
-          Args.push_back(Arg);
+          if (auto Arg = ParseExpression())
+            Args.push_back(std::move(Arg));
+          else
+            return nullptr;
 
-          if (CurTok == ')') break;
+          if (CurTok == ')')
+            break;
 
           if (CurTok != ',')
             return Error("Expected ')' or ',' in argument list");
@@ -278,7 +296,7 @@ function calls:
       // Eat the ')'.
       getNextToken();
 
-      return new CallExprAST(IdName, Args);
+      return llvm::make_unique<CallExprAST>(IdName, std::move(Args));
     }
 
 This routine follows the same style as the other routines. (It expects
@@ -294,7 +312,7 @@ Now that we have all of our simple expression-parsing logic in place, we
 can define a helper function to wrap it together into one entry point.
 We call this class of expressions "primary" expressions, for reasons
 that will become more clear `later in the
-tutorial <LangImpl6.html#unary>`_. In order to parse an arbitrary
+tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary
 primary expression, we need to determine what sort of expression it is:
 
 .. code-block:: c++
@@ -303,12 +321,16 @@ primary expression, we need to determine what sort of expression it is:
     ///   ::= identifierexpr
     ///   ::= numberexpr
     ///   ::= parenexpr
-    static ExprAST *ParsePrimary() {
+    static std::unique_ptr<ExprAST> ParsePrimary() {
       switch (CurTok) {
-      default: return Error("unknown token when expecting an expression");
-      case tok_identifier: return ParseIdentifierExpr();
-      case tok_number:     return ParseNumberExpr();
-      case '(':            return ParseParenExpr();
+      default:
+        return Error("unknown token when expecting an expression");
+      case tok_identifier:
+        return ParseIdentifierExpr();
+      case tok_number:
+        return ParseNumberExpr();
+      case '(':
+        return ParseParenExpr();
       }
     }
 
@@ -374,7 +396,7 @@ would be easy enough to eliminate the map and do the comparisons in the
 With the helper above defined, we can now start parsing binary
 expressions. The basic idea of operator precedence parsing is to break
 down an expression with potentially ambiguous binary operators into
-pieces. Consider ,for example, the expression "a+b+(c+d)\*e\*f+g".
+pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".
 Operator precedence parsing considers this as a stream of primary
 expressions separated by binary operators. As such, it will first parse
 the leading primary expression "a", then it will see the pairs [+, b]
@@ -390,11 +412,12 @@ a sequence of [binop,primaryexpr] pairs:
     /// expression
     ///   ::= primary binoprhs
     ///
-    static ExprAST *ParseExpression() {
-      ExprAST *LHS = ParsePrimary();
-      if (!LHS) return 0;
+    static std::unique_ptr<ExprAST> ParseExpression() {
+      auto LHS = ParsePrimary();
+      if (!LHS)
+        return nullptr;
 
-      return ParseBinOpRHS(0, LHS);
+      return ParseBinOpRHS(0, std::move(LHS));
     }
 
 ``ParseBinOpRHS`` is the function that parses the sequence of pairs for
@@ -416,7 +439,8 @@ starts with:
 
     /// binoprhs
     ///   ::= ('+' primary)*
-    static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
+    static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
+                                                  std::unique_ptr<ExprAST> LHS) {
       // If this is a binop, find its precedence.
       while (1) {
         int TokPrec = GetTokPrecedence();
@@ -440,8 +464,9 @@ expression:
         getNextToken();  // eat binop
 
         // Parse the primary expression after the binary operator.
-        ExprAST *RHS = ParsePrimary();
-        if (!RHS) return 0;
+        auto RHS = ParsePrimary();
+        if (!RHS)
+          return nullptr;
 
 As such, this code eats (and remembers) the binary operator and then
 parses the primary expression that follows. This builds up the whole
@@ -474,7 +499,8 @@ then continue parsing:
         }
 
         // Merge LHS/RHS.
-        LHS = new BinaryExprAST(BinOp, LHS, RHS);
+        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
+                                               std::move(RHS));
       }  // loop around to the top of the while loop.
     }
 
@@ -498,11 +524,13 @@ above two blocks duplicated for context):
         // the pending operator take RHS as its LHS.
         int NextPrec = GetTokPrecedence();
         if (TokPrec < NextPrec) {
-          RHS = ParseBinOpRHS(TokPrec+1, RHS);
-          if (RHS == 0) return 0;
+          RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));
+          if (!RHS)
+            return nullptr;
         }
         // Merge LHS/RHS.
-        LHS = new BinaryExprAST(BinOp, LHS, RHS);
+        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
+                                               std::move(RHS));
       }  // loop around to the top of the while loop.
     }
 
@@ -541,7 +569,7 @@ expressions):
 
     /// prototype
     ///   ::= id '(' id* ')'
-    static PrototypeAST *ParsePrototype() {
+    static std::unique_ptr<PrototypeAST> ParsePrototype() {
       if (CurTok != tok_identifier)
         return ErrorP("Expected function name in prototype");
 
@@ -561,7 +589,7 @@ expressions):
       // success.
       getNextToken();  // eat ')'.
 
-      return new PrototypeAST(FnName, ArgNames);
+      return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames));
     }
 
 Given this, a function definition is very simple, just a prototype plus
@@ -570,14 +598,14 @@ an expression to implement the body:
 .. code-block:: c++
 
     /// definition ::= 'def' prototype expression
-    static FunctionAST *ParseDefinition() {
+    static std::unique_ptr<FunctionAST> ParseDefinition() {
       getNextToken();  // eat def.
-      PrototypeAST *Proto = ParsePrototype();
-      if (Proto == 0) return 0;
+      auto Proto = ParsePrototype();
+      if (!Proto) return nullptr;
 
-      if (ExprAST *E = ParseExpression())
-        return new FunctionAST(Proto, E);
-      return 0;
+      if (auto E = ParseExpression())
+        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
+      return nullptr;
     }
 
 In addition, we support 'extern' to declare functions like 'sin' and
@@ -587,7 +615,7 @@ In addition, we support 'extern' to declare functions like 'sin' and
 .. code-block:: c++
 
     /// external ::= 'extern' prototype
-    static PrototypeAST *ParseExtern() {
+    static std::unique_ptr<PrototypeAST> ParseExtern() {
       getNextToken();  // eat extern.
       return ParsePrototype();
     }
@@ -599,13 +627,13 @@ nullary (zero argument) functions for them:
 .. code-block:: c++
 
     /// toplevelexpr ::= expression
-    static FunctionAST *ParseTopLevelExpr() {
-      if (ExprAST *E = ParseExpression()) {
+    static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
+      if (auto E = ParseExpression()) {
         // Make an anonymous proto.
-        PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
-        return new FunctionAST(Proto, E);
+        auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
+        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
       }
-      return 0;
+      return nullptr;
     }
 
 Now that we have all the pieces, let's build a little driver that will
@@ -616,7 +644,7 @@ The Driver
 
 The driver for this simply invokes all of the parsing pieces with a
 top-level dispatch loop. There isn't much interesting here, so I'll just
-include the top-level loop. See `below <#code>`_ for full code in the
+include the top-level loop. See `below <#full-code-listing>`_ for full code in the
 "Top-Level Parsing" section.
 
 .. code-block:: c++
@@ -626,11 +654,20 @@ include the top-level loop. See `below <#code>`_ for full code in the
       while (1) {
         fprintf(stderr, "ready> ");
         switch (CurTok) {
-        case tok_eof:    return;
-        case ';':        getNextToken(); break;  // ignore top-level semicolons.
-        case tok_def:    HandleDefinition(); break;
-        case tok_extern: HandleExtern(); break;
-        default:         HandleTopLevelExpression(); break;
+        case tok_eof:
+          return;
+        case ';': // ignore top-level semicolons.
+          getNextToken();
+          break;
+        case tok_def:
+          HandleDefinition();
+          break;
+        case tok_extern:
+          HandleExtern();
+          break;
+        default:
+          HandleTopLevelExpression();
+          break;
         }
       }
     }
diff --git a/docs/tutorial/LangImpl3.rst b/docs/tutorial/LangImpl3.rst
index 26ba4aae956c..83ad35f14aee 100644
--- a/docs/tutorial/LangImpl3.rst
+++ b/docs/tutorial/LangImpl3.rst
@@ -15,8 +15,8 @@ LLVM IR. This will teach you a little bit about how LLVM does things, as
 well as demonstrate how easy it is to use. It's much more work to build
 a lexer and parser than it is to generate LLVM IR code. :)
 
-**Please note**: the code in this chapter and later require LLVM 2.2 or
-later. LLVM 2.1 and before will not work with it. Also note that you
+**Please note**: the code in this chapter and later require LLVM 3.7 or
+later. LLVM 3.6 and before will not work with it. Also note that you
 need to use a version of this tutorial that matches your LLVM release:
 If you are using an official LLVM release, use the version of the
 documentation included with your release or on the `llvm.org releases
@@ -35,19 +35,20 @@ class:
     class ExprAST {
     public:
       virtual ~ExprAST() {}
-      virtual Value *Codegen() = 0;
+      virtual Value *codegen() = 0;
     };
 
     /// NumberExprAST - Expression class for numeric literals like "1.0".
     class NumberExprAST : public ExprAST {
       double Val;
+
     public:
-      NumberExprAST(double val) : Val(val) {}
-      virtual Value *Codegen();
+      NumberExprAST(double Val) : Val(Val) {}
+      virtual Value *codegen();
     };
     ...
 
-The Codegen() method says to emit IR for that AST node along with all
+The codegen() method says to emit IR for that AST node along with all
 the things it depends on, and they all return an LLVM Value object.
 "Value" is the class used to represent a "`Static Single Assignment
 (SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
@@ -72,16 +73,20 @@ parser, which will be used to report errors found during code generation
 
 .. code-block:: c++
 
-    Value *ErrorV(const char *Str) { Error(Str); return 0; }
-
-    static Module *TheModule;
+    static std::unique_ptr<Module> *TheModule;
     static IRBuilder<> Builder(getGlobalContext());
     static std::map<std::string, Value*> NamedValues;
 
+    Value *ErrorV(const char *Str) {
+      Error(Str);
+      return nullptr;
+    }
+
 The static variables will be used during code generation. ``TheModule``
-is the LLVM construct that contains all of the functions and global
-variables in a chunk of code. In many ways, it is the top-level
-structure that the LLVM IR uses to contain code.
+is an LLVM construct that contains functions and global variables. In many
+ways, it is the top-level structure that the LLVM IR uses to contain code.
+It will own the memory for all of the IR that we generate, which is why
+the codegen() method returns a raw Value\*, rather than a unique_ptr<Value>.
 
 The ``Builder`` object is a helper object that makes it easy to generate
 LLVM instructions. Instances of the
@@ -110,7 +115,7 @@ First we'll do numeric literals:
 
 .. code-block:: c++
 
-    Value *NumberExprAST::Codegen() {
+    Value *NumberExprAST::codegen() {
       return ConstantFP::get(getGlobalContext(), APFloat(Val));
     }
 
@@ -124,10 +129,12 @@ are all uniqued together and shared. For this reason, the API uses the
 
 .. code-block:: c++
 
-    Value *VariableExprAST::Codegen() {
+    Value *VariableExprAST::codegen() {
       // Look this variable up in the function.
       Value *V = NamedValues[Name];
-      return V ? V : ErrorV("Unknown variable name");
+      if (!V)
+        ErrorV("Unknown variable name");
+      return V;
     }
 
 References to variables are also quite simple using LLVM. In the simple
@@ -137,26 +144,31 @@ values that can be in the ``NamedValues`` map are function arguments.
 This code simply checks to see that the specified name is in the map (if
 not, an unknown variable is being referenced) and returns the value for
 it. In future chapters, we'll add support for `loop induction
-variables <LangImpl5.html#for>`_ in the symbol table, and for `local
-variables <LangImpl7.html#localvars>`_.
+variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for `local
+variables <LangImpl7.html#user-defined-local-variables>`_.
 
 .. code-block:: c++
 
-    Value *BinaryExprAST::Codegen() {
-      Value *L = LHS->Codegen();
-      Value *R = RHS->Codegen();
-      if (L == 0 || R == 0) return 0;
+    Value *BinaryExprAST::codegen() {
+      Value *L = LHS->codegen();
+      Value *R = RHS->codegen();
+      if (!L || !R)
+        return nullptr;
 
       switch (Op) {
-      case '+': return Builder.CreateFAdd(L, R, "addtmp");
-      case '-': return Builder.CreateFSub(L, R, "subtmp");
-      case '*': return Builder.CreateFMul(L, R, "multmp");
+      case '+':
+        return Builder.CreateFAdd(L, R, "addtmp");
+      case '-':
+        return Builder.CreateFSub(L, R, "subtmp");
+      case '*':
+        return Builder.CreateFMul(L, R, "multmp");
       case '<':
         L = Builder.CreateFCmpULT(L, R, "cmptmp");
         // Convert bool 0/1 to double 0.0 or 1.0
         return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
                                     "booltmp");
-      default: return ErrorV("invalid binary operator");
+      default:
+        return ErrorV("invalid binary operator");
       }
     }
 
@@ -178,55 +190,55 @@ automatically provide each one with an increasing, unique numeric
 suffix. Local value names for instructions are purely optional, but it
 makes it much easier to read the IR dumps.
 
-`LLVM instructions <../LangRef.html#instref>`_ are constrained by strict
+`LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict
 rules: for example, the Left and Right operators of an `add
-instruction <../LangRef.html#i_add>`_ must have the same type, and the
+instruction <../LangRef.html#add-instruction>`_ must have the same type, and the
 result type of the add must match the operand types. Because all values
 in Kaleidoscope are doubles, this makes for very simple code for add,
 sub and mul.
 
 On the other hand, LLVM specifies that the `fcmp
-instruction <../LangRef.html#i_fcmp>`_ always returns an 'i1' value (a
+instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a
 one bit integer). The problem with this is that Kaleidoscope wants the
 value to be a 0.0 or 1.0 value. In order to get these semantics, we
 combine the fcmp instruction with a `uitofp
-instruction <../LangRef.html#i_uitofp>`_. This instruction converts its
+instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its
 input integer into a floating point value by treating the input as an
 unsigned value. In contrast, if we used the `sitofp
-instruction <../LangRef.html#i_sitofp>`_, the Kaleidoscope '<' operator
+instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator
 would return 0.0 and -1.0, depending on the input value.
 
 .. code-block:: c++
 
-    Value *CallExprAST::Codegen() {
+    Value *CallExprAST::codegen() {
       // Look up the name in the global module table.
       Function *CalleeF = TheModule->getFunction(Callee);
-      if (CalleeF == 0)
+      if (!CalleeF)
         return ErrorV("Unknown function referenced");
 
       // If argument mismatch error.
       if (CalleeF->arg_size() != Args.size())
         return ErrorV("Incorrect # arguments passed");
 
-      std::vector<Value*> ArgsV;
+      std::vector<Value *> ArgsV;
       for (unsigned i = 0, e = Args.size(); i != e; ++i) {
-        ArgsV.push_back(Args[i]->Codegen());
-        if (ArgsV.back() == 0) return 0;
+        ArgsV.push_back(Args[i]->codegen());
+        if (!ArgsV.back())
+          return nullptr;
       }
 
       return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
     }
 
-Code generation for function calls is quite straightforward with LLVM.
-The code above initially does a function name lookup in the LLVM
-Module's symbol table. Recall that the LLVM Module is the container that
-holds all of the functions we are JIT'ing. By giving each function the
-same name as what the user specifies, we can use the LLVM symbol table
-to resolve function names for us.
+Code generation for function calls is quite straightforward with LLVM. The code
+above initially does a function name lookup in the LLVM Module's symbol table.
+Recall that the LLVM Module is the container that holds the functions we are
+JIT'ing. By giving each function the same name as what the user specifies, we
+can use the LLVM symbol table to resolve function names for us.
 
 Once we have the function to call, we recursively codegen each argument
 that is to be passed in, and create an LLVM `call
-instruction <../LangRef.html#i_call>`_. Note that LLVM uses the native C
+instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C
 calling conventions by default, allowing these calls to also call into
 standard library functions like "sin" and "cos", with no additional
 effort.
@@ -249,14 +261,15 @@ with:
 
 .. code-block:: c++
 
-    Function *PrototypeAST::Codegen() {
+    Function *PrototypeAST::codegen() {
       // Make the function type:  double(double,double) etc.
       std::vector<Type*> Doubles(Args.size(),
                                  Type::getDoubleTy(getGlobalContext()));
-      FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
-                                           Doubles, false);
+      FunctionType *FT =
+        FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false);
 
-      Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
+      Function *F =
+        Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
 
 This code packs a lot of power into a few lines. Note first that this
 function returns a "Function\*" instead of a "Value\*". Because a
@@ -273,118 +286,67 @@ double as a result, and that is not vararg (the false parameter
 indicates this). Note that Types in LLVM are uniqued just like Constants
 are, so you don't "new" a type, you "get" it.
 
-The final line above actually creates the function that the prototype
-will correspond to. This indicates the type, linkage and name to use, as
+The final line above actually creates the IR Function corresponding to
+the Prototype. This indicates the type, linkage and name to use, as
 well as which module to insert into. "`external
 linkage <../LangRef.html#linkage>`_" means that the function may be
 defined outside the current module and/or that it is callable by
 functions outside the module. The Name passed in is the name the user
 specified: since "``TheModule``" is specified, this name is registered
-in "``TheModule``"s symbol table, which is used by the function call
-code above.
+in "``TheModule``"s symbol table.
 
 .. code-block:: c++
 
-      // If F conflicted, there was already something named 'Name'.  If it has a
-      // body, don't allow redefinition or reextern.
-      if (F->getName() != Name) {
-        // Delete the one we just made and get the existing one.
-        F->eraseFromParent();
-        F = TheModule->getFunction(Name);
-
-The Module symbol table works just like the Function symbol table when
-it comes to name conflicts: if a new function is created with a name
-that was previously added to the symbol table, the new function will get
-implicitly renamed when added to the Module. The code above exploits
-this fact to determine if there was a previous definition of this
-function.
-
-In Kaleidoscope, I choose to allow redefinitions of functions in two
-cases: first, we want to allow 'extern'ing a function more than once, as
-long as the prototypes for the externs match (since all arguments have
-the same type, we just have to check that the number of arguments
-match). Second, we want to allow 'extern'ing a function and then
-defining a body for it. This is useful when defining mutually recursive
-functions.
-
-In order to implement this, the code above first checks to see if there
-is a collision on the name of the function. If so, it deletes the
-function we just created (by calling ``eraseFromParent``) and then
-calling ``getFunction`` to get the existing function with the specified
-name. Note that many APIs in LLVM have "erase" forms and "remove" forms.
-The "remove" form unlinks the object from its parent (e.g. a Function
-from a Module) and returns it. The "erase" form unlinks the object and
-then deletes it.
+  // Set names for all arguments.
+  unsigned Idx = 0;
+  for (auto &Arg : F->args())
+    Arg.setName(Args[Idx++]);
 
-.. code-block:: c++
+  return F;
 
-        // If F already has a body, reject this.
-        if (!F->empty()) {
-          ErrorF("redefinition of function");
-          return 0;
-        }
-
-        // If F took a different number of args, reject.
-        if (F->arg_size() != Args.size()) {
-          ErrorF("redefinition of function with different # args");
-          return 0;
-        }
-      }
+Finally, we set the name of each of the function's arguments according to the
+names given in the Prototype. This step isn't strictly necessary, but keeping
+the names consistent makes the IR more readable, and allows subsequent code to
+refer directly to the arguments for their names, rather than having to look up
+them up in the Prototype AST.
 
-In order to verify the logic above, we first check to see if the
-pre-existing function is "empty". In this case, empty means that it has
-no basic blocks in it, which means it has no body. If it has no body, it
-is a forward declaration. Since we don't allow anything after a full
-definition of the function, the code rejects this case. If the previous
-reference to a function was an 'extern', we simply verify that the
-number of arguments for that definition and this one match up. If not,
-we emit an error.
+At this point we have a function prototype with no body. This is how LLVM IR
+represents function declarations. For extern statements in Kaleidoscope, this
+is as far as we need to go. For function definitions however, we need to
+codegen and attach a function body.
 
 .. code-block:: c++
 
-      // Set names for all arguments.
-      unsigned Idx = 0;
-      for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
-           ++AI, ++Idx) {
-        AI->setName(Args[Idx]);
+  Function *FunctionAST::codegen() {
+      // First, check for an existing function from a previous 'extern' declaration.
+    Function *TheFunction = TheModule->getFunction(Proto->getName());
 
-        // Add arguments to variable symbol table.
-        NamedValues[Args[Idx]] = AI;
-      }
-      return F;
-    }
+    if (!TheFunction)
+      TheFunction = Proto->codegen();
 
-The last bit of code for prototypes loops over all of the arguments in
-the function, setting the name of the LLVM Argument objects to match,
-and registering the arguments in the ``NamedValues`` map for future use
-by the ``VariableExprAST`` AST node. Once this is set up, it returns the
-Function object to the caller. Note that we don't check for conflicting
-argument names here (e.g. "extern foo(a b a)"). Doing so would be very
-straight-forward with the mechanics we have already used above.
+    if (!TheFunction)
+      return nullptr;
 
-.. code-block:: c++
+    if (!TheFunction->empty())
+      return (Function*)ErrorV("Function cannot be redefined.");
 
-    Function *FunctionAST::Codegen() {
-      NamedValues.clear();
 
-      Function *TheFunction = Proto->Codegen();
-      if (TheFunction == 0)
-        return 0;
-
-Code generation for function definitions starts out simply enough: we
-just codegen the prototype (Proto) and verify that it is ok. We then
-clear out the ``NamedValues`` map to make sure that there isn't anything
-in it from the last function we compiled. Code generation of the
-prototype ensures that there is an LLVM Function object that is ready to
-go for us.
+For function definitions, we start by searching TheModule's symbol table for an
+existing version of this function, in case one has already been created using an
+'extern' statement. If Module::getFunction returns null then no previous version
+exists, so we'll codegen one from the Prototype. In either case, we want to
+assert that the function is empty (i.e. has no body yet) before we start.
 
 .. code-block:: c++
 
-      // Create a new basic block to start insertion into.
-      BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
-      Builder.SetInsertPoint(BB);
+  // Create a new basic block to start insertion into.
+  BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
+  Builder.SetInsertPoint(BB);
 
-      if (Value *RetVal = Body->Codegen()) {
+  // Record the function arguments in the NamedValues map.
+  NamedValues.clear();
+  for (auto &Arg : TheFunction->args())
+    NamedValues[Arg.getName()] = &Arg;
 
 Now we get to the point where the ``Builder`` is set up. The first line
 creates a new `basic block <http://en.wikipedia.org/wiki/Basic_block>`_
@@ -396,9 +358,12 @@ Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
 don't have any control flow, our functions will only contain one block
 at this point. We'll fix this in `Chapter 5 <LangImpl5.html>`_ :).
 
+Next we add the function arguments to the NamedValues map (after first clearing
+it out) so that they're accessible to ``VariableExprAST`` nodes.
+
 .. code-block:: c++
 
-      if (Value *RetVal = Body->Codegen()) {
+      if (Value *RetVal = Body->codegen()) {
         // Finish off the function.
         Builder.CreateRet(RetVal);
 
@@ -408,11 +373,11 @@ at this point. We'll fix this in `Chapter 5 <LangImpl5.html>`_ :).
         return TheFunction;
       }
 
-Once the insertion point is set up, we call the ``CodeGen()`` method for
-the root expression of the function. If no error happens, this emits
-code to compute the expression into the entry block and returns the
-value that was computed. Assuming no error, we then create an LLVM `ret
-instruction <../LangRef.html#i_ret>`_, which completes the function.
+Once the insertion point has been set up and the NamedValues map populated,
+we call the ``codegen()`` method for the root expression of the function. If no
+error happens, this emits code to compute the expression into the entry block
+and returns the value that was computed. Assuming no error, we then create an
+LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the function.
 Once the function is built, we call ``verifyFunction``, which is
 provided by LLVM. This function does a variety of consistency checks on
 the generated code, to determine if our compiler is doing everything
@@ -423,7 +388,7 @@ function is finished and validated, we return it.
 
       // Error reading body, remove function.
       TheFunction->eraseFromParent();
-      return 0;
+      return nullptr;
     }
 
 The only piece left here is handling of the error case. For simplicity,
@@ -432,23 +397,25 @@ we handle this by merely deleting the function we produced with the
 that they incorrectly typed in before: if we didn't delete it, it would
 live in the symbol table, with a body, preventing future redefinition.
 
-This code does have a bug, though. Since the ``PrototypeAST::Codegen``
-can return a previously defined forward declaration, our code can
-actually delete a forward declaration. There are a number of ways to fix
-this bug, see what you can come up with! Here is a testcase:
+This code does have a bug, though: If the ``FunctionAST::codegen()`` method
+finds an existing IR Function, it does not validate its signature against the
+definition's own prototype. This means that an earlier 'extern' declaration will
+take precedence over the function definition's signature, which can cause
+codegen to fail, for instance if the function arguments are named differently.
+There are a number of ways to fix this bug, see what you can come up with! Here
+is a testcase:
 
 ::
 
-    extern foo(a b);     # ok, defines foo.
-    def foo(a b) c;      # error, 'c' is invalid.
-    def bar() foo(1, 2); # error, unknown function "foo"
+    extern foo(a);     # ok, defines foo.
+    def foo(b) b;      # Error: Unknown variable name. (decl using 'a' takes precedence).
 
 Driver Changes and Closing Thoughts
 ===================================
 
 For now, code generation to LLVM doesn't really get us much, except that
 we can look at the pretty IR calls. The sample code inserts calls to
-Codegen into the "``HandleDefinition``", "``HandleExtern``" etc
+codegen into the "``HandleDefinition``", "``HandleExtern``" etc
 functions, and then dumps out the LLVM IR. This gives a nice way to look
 at the LLVM IR for simple functions. For example:
 
@@ -463,10 +430,10 @@ at the LLVM IR for simple functions. For example:
 
 Note how the parser turns the top-level expression into anonymous
 functions for us. This will be handy when we add `JIT
-support <LangImpl4.html#jit>`_ in the next chapter. Also note that the
+support <LangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that the
 code is very literally transcribed, no optimizations are being performed
 except simple constant folding done by IRBuilder. We will `add
-optimizations <LangImpl4.html#trivialconstfold>`_ explicitly in the next
+optimizations <LangImpl4.html#trivial-constant-folding>`_ explicitly in the next
 chapter.
 
 ::
diff --git a/docs/tutorial/LangImpl4.rst b/docs/tutorial/LangImpl4.rst
index cdaac634dd76..a671d0c37f9d 100644
--- a/docs/tutorial/LangImpl4.rst
+++ b/docs/tutorial/LangImpl4.rst
@@ -120,57 +120,53 @@ exactly the code we have now, except that we would defer running the
 optimizer until the entire file has been parsed.
 
 In order to get per-function optimizations going, we need to set up a
-`FunctionPassManager <../WritingAnLLVMPass.html#passmanager>`_ to hold
+`FunctionPassManager <../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold
 and organize the LLVM optimizations that we want to run. Once we have
-that, we can add a set of optimizations to run. The code looks like
-this:
+that, we can add a set of optimizations to run. We'll need a new
+FunctionPassManager for each module that we want to optimize, so we'll
+write a function to create and initialize both the module and pass manager
+for us:
 
 .. code-block:: c++
 
-      FunctionPassManager OurFPM(TheModule);
+    void InitializeModuleAndPassManager(void) {
+      // Open a new module.
+      TheModule = llvm::make_unique<Module>("my cool jit", getGlobalContext());
+      TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());
+
+      // Create a new pass manager attached to it.
+      TheFPM = llvm::make_unique<FunctionPassManager>(TheModule.get());
 
-      // Set up the optimizer pipeline.  Start with registering info about how the
-      // target lays out data structures.
-      OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));
       // Provide basic AliasAnalysis support for GVN.
-      OurFPM.add(createBasicAliasAnalysisPass());
+      TheFPM.add(createBasicAliasAnalysisPass());
       // Do simple "peephole" optimizations and bit-twiddling optzns.
-      OurFPM.add(createInstructionCombiningPass());
+      TheFPM.add(createInstructionCombiningPass());
       // Reassociate expressions.
-      OurFPM.add(createReassociatePass());
+      TheFPM.add(createReassociatePass());
       // Eliminate Common SubExpressions.
-      OurFPM.add(createGVNPass());
+      TheFPM.add(createGVNPass());
       // Simplify the control flow graph (deleting unreachable blocks, etc).
-      OurFPM.add(createCFGSimplificationPass());
-
-      OurFPM.doInitialization();
-
-      // Set the global so the code gen can use this.
-      TheFPM = &OurFPM;
+      TheFPM.add(createCFGSimplificationPass());
 
-      // Run the main "interpreter loop" now.
-      MainLoop();
+      TheFPM.doInitialization();
+    }
 
-This code defines a ``FunctionPassManager``, "``OurFPM``". It requires a
-pointer to the ``Module`` to construct itself. Once it is set up, we use
-a series of "add" calls to add a bunch of LLVM passes. The first pass is
-basically boilerplate, it adds a pass so that later optimizations know
-how the data structures in the program are laid out. The
-"``TheExecutionEngine``" variable is related to the JIT, which we will
-get to in the next section.
+This code initializes the global module ``TheModule``, and the function pass
+manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is
+set up, we use a series of "add" calls to add a bunch of LLVM passes.
 
-In this case, we choose to add 4 optimization passes. The passes we
-chose here are a pretty standard set of "cleanup" optimizations that are
-useful for a wide variety of code. I won't delve into what they do but,
-believe me, they are a good starting place :).
+In this case, we choose to add five passes: one analysis pass (alias analysis),
+and four optimization passes. The passes we choose here are a pretty standard set
+of "cleanup" optimizations that are useful for a wide variety of code. I won't
+delve into what they do but, believe me, they are a good starting place :).
 
 Once the PassManager is set up, we need to make use of it. We do this by
 running it after our newly created function is constructed (in
-``FunctionAST::Codegen``), but before it is returned to the client:
+``FunctionAST::codegen()``), but before it is returned to the client:
 
 .. code-block:: c++
 
-      if (Value *RetVal = Body->Codegen()) {
+      if (Value *RetVal = Body->codegen()) {
         // Finish off the function.
         Builder.CreateRet(RetVal);
 
@@ -231,55 +227,85 @@ should evaluate and print out 3. If they define a function, they should
 be able to call it from the command line.
 
 In order to do this, we first declare and initialize the JIT. This is
-done by adding a global variable and a call in ``main``:
+done by adding a global variable ``TheJIT``, and initializing it in
+``main``:
 
 .. code-block:: c++
 
-    static ExecutionEngine *TheExecutionEngine;
+    static std::unique_ptr<KaleidoscopeJIT> TheJIT;
     ...
     int main() {
       ..
-      // Create the JIT.  This takes ownership of the module.
-      TheExecutionEngine = EngineBuilder(TheModule).create();
-      ..
+      TheJIT = llvm::make_unique<KaleidoscopeJIT>();
+
+      // Run the main "interpreter loop" now.
+      MainLoop();
+
+      return 0;
     }
 
-This creates an abstract "Execution Engine" which can be either a JIT
-compiler or the LLVM interpreter. LLVM will automatically pick a JIT
-compiler for you if one is available for your platform, otherwise it
-will fall back to the interpreter.
+The KaleidoscopeJIT class is a simple JIT built specifically for these
+tutorials. In later chapters we will look at how it works and extend it with
+new features, but for now we will take it as given. Its API is very simple::
+``addModule`` adds an LLVM IR module to the JIT, making its functions
+available for execution; ``removeModule`` removes a module, freeing any
+memory associated with the code in that module; and ``findSymbol`` allows us
+to look up pointers to the compiled code.
 
-Once the ``ExecutionEngine`` is created, the JIT is ready to be used.
-There are a variety of APIs that are useful, but the simplest one is the
-"``getPointerToFunction(F)``" method. This method JIT compiles the
-specified LLVM Function and returns a function pointer to the generated
-machine code. In our case, this means that we can change the code that
-parses a top-level expression to look like this:
+We can take this simple API and change our code that parses top-level expressions to
+look like this:
 
 .. code-block:: c++
 
     static void HandleTopLevelExpression() {
       // Evaluate a top-level expression into an anonymous function.
-      if (FunctionAST *F = ParseTopLevelExpr()) {
-        if (Function *LF = F->Codegen()) {
-          LF->dump();  // Dump the function for exposition purposes.
+      if (auto FnAST = ParseTopLevelExpr()) {
+        if (FnAST->codegen()) {
+
+          // JIT the module containing the anonymous expression, keeping a handle so
+          // we can free it later.
+          auto H = TheJIT->addModule(std::move(TheModule));
+          InitializeModuleAndPassManager();
 
-          // JIT the function, returning a function pointer.
-          void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
+          // Search the JIT for the __anon_expr symbol.
+          auto ExprSymbol = TheJIT->findSymbol("__anon_expr");
+          assert(ExprSymbol && "Function not found");
 
-          // Cast it to the right type (takes no arguments, returns a double) so we
-          // can call it as a native function.
-          double (*FP)() = (double (*)())(intptr_t)FPtr;
+          // Get the symbol's address and cast it to the right type (takes no
+          // arguments, returns a double) so we can call it as a native function.
+          double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
           fprintf(stderr, "Evaluated to %f\n", FP());
+
+          // Delete the anonymous expression module from the JIT.
+          TheJIT->removeModule(H);
         }
 
-Recall that we compile top-level expressions into a self-contained LLVM
-function that takes no arguments and returns the computed double.
-Because the LLVM JIT compiler matches the native platform ABI, this
-means that you can just cast the result pointer to a function pointer of
-that type and call it directly. This means, there is no difference
-between JIT compiled code and native machine code that is statically
-linked into your application.
+If parsing and codegen succeeed, the next step is to add the module containing
+the top-level expression to the JIT. We do this by calling addModule, which
+triggers code generation for all the functions in the module, and returns a
+handle that can be used to remove the module from the JIT later. Once the module
+has been added to the JIT it can no longer be modified, so we also open a new
+module to hold subsequent code by calling ``InitializeModuleAndPassManager()``.
+
+Once we've added the module to the JIT we need to get a pointer to the final
+generated code. We do this by calling the JIT's findSymbol method, and passing
+the name of the top-level expression function: ``__anon_expr``. Since we just
+added this function, we assert that findSymbol returned a result.
+
+Next, we get the in-memory address of the ``__anon_expr`` function by calling
+``getAddress()`` on the symbol. Recall that we compile top-level expressions
+into a self-contained LLVM function that takes no arguments and returns the
+computed double. Because the LLVM JIT compiler matches the native platform ABI,
+this means that you can just cast the result pointer to a function pointer of
+that type and call it directly. This means, there is no difference between JIT
+compiled code and native machine code that is statically linked into your
+application.
+
+Finally, since we don't support re-evaluation of top-level expressions, we
+remove the module from the JIT when we're done to free the associated memory.
+Recall, however, that the module we created a few lines earlier (via
+``InitializeModuleAndPassManager``) is still open and waiting for new code to be
+added.
 
 With just these two changes, lets see how Kaleidoscope works now!
 
@@ -320,19 +346,161 @@ demonstrates very basic functionality, but can we do more?
 
     Evaluated to 24.000000
 
-This illustrates that we can now call user code, but there is something
-a bit subtle going on here. Note that we only invoke the JIT on the
-anonymous functions that *call testfunc*, but we never invoked it on
-*testfunc* itself. What actually happened here is that the JIT scanned
-for all non-JIT'd functions transitively called from the anonymous
-function and compiled all of them before returning from
-``getPointerToFunction()``.
+    ready> testfunc(5, 10);
+    ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved!
+
+
+Function definitions and calls also work, but something went very wrong on that
+last line. The call looks valid, so what happened? As you may have guessed from
+the the API a Module is a unit of allocation for the JIT, and testfunc was part
+of the same module that contained anonymous expression. When we removed that
+module from the JIT to free the memory for the anonymous expression, we deleted
+the definition of ``testfunc`` along with it. Then, when we tried to call
+testfunc a second time, the JIT could no longer find it.
+
+The easiest way to fix this is to put the anonymous expression in a separate
+module from the rest of the function definitions. The JIT will happily resolve
+function calls across module boundaries, as long as each of the functions called
+has a prototype, and is added to the JIT before it is called. By putting the
+anonymous expression in a different module we can delete it without affecting
+the rest of the functions.
+
+In fact, we're going to go a step further and put every function in its own
+module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT
+that will make our environment more REPL-like: Functions can be added to the
+JIT more than once (unlike a module where every function must have a unique
+definition). When you look up a symbol in KaleidoscopeJIT it will always return
+the most recent definition:
+
+::
+
+    ready> def foo(x) x + 1;
+    Read function definition:
+    define double @foo(double %x) {
+    entry:
+      %addtmp = fadd double %x, 1.000000e+00
+      ret double %addtmp
+    }
+
+    ready> foo(2);
+    Evaluated to 3.000000
+
+    ready> def foo(x) x + 2;
+    define double @foo(double %x) {
+    entry:
+      %addtmp = fadd double %x, 2.000000e+00
+      ret double %addtmp
+    }
+
+    ready> foo(2);
+    Evaluated to 4.000000
+
+
+To allow each function to live in its own module we'll need a way to
+re-generate previous function declarations into each new module we open:
+
+.. code-block:: c++
+
+    static std::unique_ptr<KaleidoscopeJIT> TheJIT;
+
+    ...
+
+    Function *getFunction(std::string Name) {
+      // First, see if the function has already been added to the current module.
+      if (auto *F = TheModule->getFunction(Name))
+        return F;
+
+      // If not, check whether we can codegen the declaration from some existing
+      // prototype.
+      auto FI = FunctionProtos.find(Name);
+      if (FI != FunctionProtos.end())
+        return FI->second->codegen();
+
+      // If no existing prototype exists, return null.
+      return nullptr;
+    }
+
+    ...
+
+    Value *CallExprAST::codegen() {
+      // Look up the name in the global module table.
+      Function *CalleeF = getFunction(Callee);
+
+    ...
+
+    Function *FunctionAST::codegen() {
+      // Transfer ownership of the prototype to the FunctionProtos map, but keep a
+      // reference to it for use below.
+      auto &P = *Proto;
+      FunctionProtos[Proto->getName()] = std::move(Proto);
+      Function *TheFunction = getFunction(P.getName());
+      if (!TheFunction)
+        return nullptr;
+
+
+To enable this, we'll start by adding a new global, ``FunctionProtos``, that
+holds the most recent prototype for each function. We'll also add a convenience
+method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``.
+Our convenience method searches ``TheModule`` for an existing function
+declaration, falling back to generating a new declaration from FunctionProtos if
+it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the
+call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to
+update the FunctionProtos map first, then call ``getFunction()``. With this
+done, we can always obtain a function declaration in the current module for any
+previously declared function.
+
+We also need to update HandleDefinition and HandleExtern:
+
+.. code-block:: c++
+
+    static void HandleDefinition() {
+      if (auto FnAST = ParseDefinition()) {
+        if (auto *FnIR = FnAST->codegen()) {
+          fprintf(stderr, "Read function definition:");
+          FnIR->dump();
+          TheJIT->addModule(std::move(TheModule));
+          InitializeModuleAndPassManager();
+        }
+      } else {
+        // Skip token for error recovery.
+         getNextToken();
+      }
+    }
+
+    static void HandleExtern() {
+      if (auto ProtoAST = ParseExtern()) {
+        if (auto *FnIR = ProtoAST->codegen()) {
+          fprintf(stderr, "Read extern: ");
+          FnIR->dump();
+          FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);
+        }
+      } else {
+        // Skip token for error recovery.
+        getNextToken();
+      }
+    }
+
+In HandleDefinition, we add two lines to transfer the newly defined function to
+the JIT and open a new module. In HandleExtern, we just need to add one line to
+add the prototype to FunctionProtos.
+
+With these changes made, lets try our REPL again (I removed the dump of the
+anonymous functions this time, you should get the idea by now :) :
+
+::
+
+    ready> def foo(x) x + 1;
+    ready> foo(2);
+    Evaluated to 3.000000
+
+    ready> def foo(x) x + 2;
+    ready> foo(2);
+    Evaluated to 4.000000
+
+It works!
 
-The JIT provides a number of other more advanced interfaces for things
-like freeing allocated machine code, rejit'ing functions to update them,
-etc. However, even with this simple code, we get some surprisingly
-powerful capabilities - check this out (I removed the dump of the
-anonymous functions, you should get the idea by now :) :
+Even with this simple code, we get some surprisingly powerful capabilities -
+check this out:
 
 ::
 
@@ -375,34 +543,30 @@ anonymous functions, you should get the idea by now :) :
 
     Evaluated to 1.000000
 
-Whoa, how does the JIT know about sin and cos? The answer is
-surprisingly simple: in this example, the JIT started execution of a
-function and got to a function call. It realized that the function was
-not yet JIT compiled and invoked the standard set of routines to resolve
-the function. In this case, there is no body defined for the function,
-so the JIT ended up calling "``dlsym("sin")``" on the Kaleidoscope
-process itself. Since "``sin``" is defined within the JIT's address
-space, it simply patches up calls in the module to call the libm version
-of ``sin`` directly.
-
-The LLVM JIT provides a number of interfaces (look in the
-``ExecutionEngine.h`` file) for controlling how unknown functions get
-resolved. It allows you to establish explicit mappings between IR
-objects and addresses (useful for LLVM global variables that you want to
-map to static tables, for example), allows you to dynamically decide on
-the fly based on the function name, and even allows you to have the JIT
-compile functions lazily the first time they're called.
-
-One interesting application of this is that we can now extend the
-language by writing arbitrary C++ code to implement operations. For
-example, if we add:
+Whoa, how does the JIT know about sin and cos? The answer is surprisingly
+simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that
+it uses to find symbols that aren't available in any given module: First
+it searches all the modules that have already been added to the JIT, from the
+most recent to the oldest, to find the newest definition. If no definition is
+found inside the JIT, it falls back to calling "``dlsym("sin")``" on the
+Kaleidoscope process itself. Since "``sin``" is defined within the JIT's
+address space, it simply patches up calls in the module to call the libm
+version of ``sin`` directly.
+
+In the future we'll see how tweaking this symbol resolution rule can be used to
+enable all sorts of useful features, from security (restricting the set of
+symbols available to JIT'd code), to dynamic code generation based on symbol
+names, and even lazy compilation.
+
+One immediate benefit of the symbol resolution rule is that we can now extend
+the language by writing arbitrary C++ code to implement operations. For example,
+if we add:
 
 .. code-block:: c++
 
     /// putchard - putchar that takes a double and returns 0.
-    extern "C"
-    double putchard(double X) {
-      putchar((char)X);
+    extern "C" double putchard(double X) {
+      fputc((char)X, stderr);
       return 0;
     }
 
diff --git a/docs/tutorial/LangImpl5.rst b/docs/tutorial/LangImpl5.rst
index ca2ffebc19a2..d916f92bf99e 100644
--- a/docs/tutorial/LangImpl5.rst
+++ b/docs/tutorial/LangImpl5.rst
@@ -66,7 +66,9 @@ for the relevant tokens:
 .. code-block:: c++
 
       // control
-      tok_if = -6, tok_then = -7, tok_else = -8,
+      tok_if = -6,
+      tok_then = -7,
+      tok_else = -8,
 
 Once we have that, we recognize the new keywords in the lexer. This is
 pretty simple stuff:
@@ -74,11 +76,16 @@ pretty simple stuff:
 .. code-block:: c++
 
         ...
-        if (IdentifierStr == "def") return tok_def;
-        if (IdentifierStr == "extern") return tok_extern;
-        if (IdentifierStr == "if") return tok_if;
-        if (IdentifierStr == "then") return tok_then;
-        if (IdentifierStr == "else") return tok_else;
+        if (IdentifierStr == "def")
+          return tok_def;
+        if (IdentifierStr == "extern")
+          return tok_extern;
+        if (IdentifierStr == "if")
+          return tok_if;
+        if (IdentifierStr == "then")
+          return tok_then;
+        if (IdentifierStr == "else")
+          return tok_else;
         return tok_identifier;
 
 AST Extensions for If/Then/Else
@@ -90,11 +97,13 @@ To represent the new expression we add a new AST node for it:
 
     /// IfExprAST - Expression class for if/then/else.
     class IfExprAST : public ExprAST {
-      ExprAST *Cond, *Then, *Else;
+      std::unique_ptr<ExprAST> Cond, Then, Else;
+
     public:
-      IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
-        : Cond(cond), Then(then), Else(_else) {}
-      virtual Value *Codegen();
+      IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then,
+                std::unique_ptr<ExprAST> Else)
+        : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {}
+      virtual Value *codegen();
     };
 
 The AST node just has pointers to the various subexpressions.
@@ -109,42 +118,51 @@ First we define a new parsing function:
 .. code-block:: c++
 
     /// ifexpr ::= 'if' expression 'then' expression 'else' expression
-    static ExprAST *ParseIfExpr() {
+    static std::unique_ptr<ExprAST> ParseIfExpr() {
       getNextToken();  // eat the if.
 
       // condition.
-      ExprAST *Cond = ParseExpression();
-      if (!Cond) return 0;
+      auto Cond = ParseExpression();
+      if (!Cond)
+        return nullptr;
 
       if (CurTok != tok_then)
         return Error("expected then");
       getNextToken();  // eat the then
 
-      ExprAST *Then = ParseExpression();
-      if (Then == 0) return 0;
+      auto Then = ParseExpression();
+      if (!Then)
+        return nullptr;
 
       if (CurTok != tok_else)
         return Error("expected else");
 
       getNextToken();
 
-      ExprAST *Else = ParseExpression();
-      if (!Else) return 0;
+      auto Else = ParseExpression();
+      if (!Else)
+        return nullptr;
 
-      return new IfExprAST(Cond, Then, Else);
+      return llvm::make_unique<IfExprAST>(std::move(Cond), std::move(Then),
+                                          std::move(Else));
     }
 
 Next we hook it up as a primary expression:
 
 .. code-block:: c++
 
-    static ExprAST *ParsePrimary() {
+    static std::unique_ptr<ExprAST> ParsePrimary() {
       switch (CurTok) {
-      default: return Error("unknown token when expecting an expression");
-      case tok_identifier: return ParseIdentifierExpr();
-      case tok_number:     return ParseNumberExpr();
-      case '(':            return ParseParenExpr();
-      case tok_if:         return ParseIfExpr();
+      default:
+        return Error("unknown token when expecting an expression");
+      case tok_identifier:
+        return ParseIdentifierExpr();
+      case tok_number:
+        return ParseNumberExpr();
+      case '(':
+        return ParseParenExpr();
+      case tok_if:
+        return ParseIfExpr();
       }
     }
 
@@ -196,7 +214,7 @@ Kaleidoscope looks like this:
 To visualize the control flow graph, you can use a nifty feature of the
 LLVM '`opt <http://llvm.org/cmds/opt.html>`_' tool. If you put this LLVM
 IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a
-window will pop up <../ProgrammersManual.html#ViewGraph>`_ and you'll
+window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll
 see this graph:
 
 .. figure:: LangImpl5-cfg.png
@@ -262,19 +280,19 @@ Okay, enough of the motivation and overview, lets generate code!
 Code Generation for If/Then/Else
 --------------------------------
 
-In order to generate code for this, we implement the ``Codegen`` method
+In order to generate code for this, we implement the ``codegen`` method
 for ``IfExprAST``:
 
 .. code-block:: c++
 
-    Value *IfExprAST::Codegen() {
-      Value *CondV = Cond->Codegen();
-      if (CondV == 0) return 0;
+    Value *IfExprAST::codegen() {
+      Value *CondV = Cond->codegen();
+      if (!CondV)
+        return nullptr;
 
       // Convert condition to a bool by comparing equal to 0.0.
-      CondV = Builder.CreateFCmpONE(CondV,
-                                  ConstantFP::get(getGlobalContext(), APFloat(0.0)),
-                                    "ifcond");
+      CondV = Builder.CreateFCmpONE(
+          CondV, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "ifcond");
 
 This code is straightforward and similar to what we saw before. We emit
 the expression for the condition, then compare that value to zero to get
@@ -286,7 +304,8 @@ a truth value as a 1-bit (bool) value.
 
       // Create blocks for the then and else cases.  Insert the 'then' block at the
       // end of the function.
-      BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
+      BasicBlock *ThenBB =
+          BasicBlock::Create(getGlobalContext(), "then", TheFunction);
       BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
       BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
 
@@ -318,8 +337,9 @@ that LLVM supports forward references.
       // Emit then value.
       Builder.SetInsertPoint(ThenBB);
 
-      Value *ThenV = Then->Codegen();
-      if (ThenV == 0) return 0;
+      Value *ThenV = Then->codegen();
+      if (!ThenV)
+        return nullptr;
 
       Builder.CreateBr(MergeBB);
       // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
@@ -349,7 +369,7 @@ of the block in the CFG. Why then, are we getting the current block when
 we just set it to ThenBB 5 lines above? The problem is that the "Then"
 expression may actually itself change the block that the Builder is
 emitting into if, for example, it contains a nested "if/then/else"
-expression. Because calling Codegen recursively could arbitrarily change
+expression. Because calling ``codegen()`` recursively could arbitrarily change
 the notion of the current block, we are required to get an up-to-date
 value for code that will set up the Phi node.
 
@@ -359,11 +379,12 @@ value for code that will set up the Phi node.
       TheFunction->getBasicBlockList().push_back(ElseBB);
       Builder.SetInsertPoint(ElseBB);
 
-      Value *ElseV = Else->Codegen();
-      if (ElseV == 0) return 0;
+      Value *ElseV = Else->codegen();
+      if (!ElseV)
+        return nullptr;
 
       Builder.CreateBr(MergeBB);
-      // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
+      // codegen of 'Else' can change the current block, update ElseBB for the PHI.
       ElseBB = Builder.GetInsertBlock();
 
 Code generation for the 'else' block is basically identical to codegen
@@ -378,8 +399,8 @@ code:
       // Emit merge block.
       TheFunction->getBasicBlockList().push_back(MergeBB);
       Builder.SetInsertPoint(MergeBB);
-      PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2,
-                                      "iftmp");
+      PHINode *PN =
+        Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, "iftmp");
 
       PN->addIncoming(ThenV, ThenBB);
       PN->addIncoming(ElseV, ElseBB);
@@ -444,13 +465,20 @@ The lexer extensions are the same sort of thing as for if/then/else:
       tok_for = -9, tok_in = -10
 
       ... in gettok ...
-      if (IdentifierStr == "def") return tok_def;
-      if (IdentifierStr == "extern") return tok_extern;
-      if (IdentifierStr == "if") return tok_if;
-      if (IdentifierStr == "then") return tok_then;
-      if (IdentifierStr == "else") return tok_else;
-      if (IdentifierStr == "for") return tok_for;
-      if (IdentifierStr == "in") return tok_in;
+      if (IdentifierStr == "def")
+        return tok_def;
+      if (IdentifierStr == "extern")
+        return tok_extern;
+      if (IdentifierStr == "if")
+        return tok_if;
+      if (IdentifierStr == "then")
+        return tok_then;
+      if (IdentifierStr == "else")
+        return tok_else;
+      if (IdentifierStr == "for")
+        return tok_for;
+      if (IdentifierStr == "in")
+        return tok_in;
       return tok_identifier;
 
 AST Extensions for the 'for' Loop
@@ -464,12 +492,15 @@ variable name and the constituent expressions in the node.
     /// ForExprAST - Expression class for for/in.
     class ForExprAST : public ExprAST {
       std::string VarName;
-      ExprAST *Start, *End, *Step, *Body;
+      std::unique_ptr<ExprAST> Start, End, Step, Body;
+
     public:
-      ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
-                 ExprAST *step, ExprAST *body)
-        : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
-      virtual Value *Codegen();
+      ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start,
+                 std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step,
+                 std::unique_ptr<ExprAST> Body)
+        : VarName(VarName), Start(std::move(Start)), End(std::move(End)),
+          Step(std::move(Step)), Body(std::move(Body)) {}
+      virtual Value *codegen();
     };
 
 Parser Extensions for the 'for' Loop
@@ -483,7 +514,7 @@ value to null in the AST node:
 .. code-block:: c++
 
     /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
-    static ExprAST *ParseForExpr() {
+    static std::unique_ptr<ExprAST> ParseForExpr() {
       getNextToken();  // eat the for.
 
       if (CurTok != tok_identifier)
@@ -497,31 +528,37 @@ value to null in the AST node:
       getNextToken();  // eat '='.
 
 
-      ExprAST *Start = ParseExpression();
-      if (Start == 0) return 0;
+      auto Start = ParseExpression();
+      if (!Start)
+        return nullptr;
       if (CurTok != ',')
         return Error("expected ',' after for start value");
       getNextToken();
 
-      ExprAST *End = ParseExpression();
-      if (End == 0) return 0;
+      auto End = ParseExpression();
+      if (!End)
+        return nullptr;
 
       // The step value is optional.
-      ExprAST *Step = 0;
+      std::unique_ptr<ExprAST> Step;
       if (CurTok == ',') {
         getNextToken();
         Step = ParseExpression();
-        if (Step == 0) return 0;
+        if (!Step)
+          return nullptr;
       }
 
       if (CurTok != tok_in)
         return Error("expected 'in' after for");
       getNextToken();  // eat 'in'.
 
-      ExprAST *Body = ParseExpression();
-      if (Body == 0) return 0;
+      auto Body = ParseExpression();
+      if (!Body)
+        return nullptr;
 
-      return new ForExprAST(IdName, Start, End, Step, Body);
+      return llvm::make_unique<ForExprAST>(IdName, std::move(Start),
+                                           std::move(End), std::move(Step),
+                                           std::move(Body));
     }
 
 LLVM IR for the 'for' Loop
@@ -565,14 +602,14 @@ together.
 Code Generation for the 'for' Loop
 ----------------------------------
 
-The first part of Codegen is very simple: we just output the start
+The first part of codegen is very simple: we just output the start
 expression for the loop value:
 
 .. code-block:: c++
 
-    Value *ForExprAST::Codegen() {
+    Value *ForExprAST::codegen() {
       // Emit the start code first, without 'variable' in scope.
-      Value *StartVal = Start->Codegen();
+      Value *StartVal = Start->codegen();
       if (StartVal == 0) return 0;
 
 With this out of the way, the next step is to set up the LLVM basic
@@ -587,7 +624,8 @@ expression).
       // block.
       Function *TheFunction = Builder.GetInsertBlock()->getParent();
       BasicBlock *PreheaderBB = Builder.GetInsertBlock();
-      BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
+      BasicBlock *LoopBB =
+          BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
 
       // Insert an explicit fall through from the current block to the LoopBB.
       Builder.CreateBr(LoopBB);
@@ -604,7 +642,8 @@ the two blocks.
       Builder.SetInsertPoint(LoopBB);
 
       // Start the PHI node with an entry for Start.
-      PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str());
+      PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()),
+                                            2, VarName.c_str());
       Variable->addIncoming(StartVal, PreheaderBB);
 
 Now that the "preheader" for the loop is set up, we switch to emitting
@@ -624,8 +663,8 @@ backedge, but we can't set it up yet (because it doesn't exist!).
       // Emit the body of the loop.  This, like any other expr, can change the
       // current BB.  Note that we ignore the value computed by the body, but don't
       // allow an error.
-      if (Body->Codegen() == 0)
-        return 0;
+      if (!Body->codegen())
+        return nullptr;
 
 Now the code starts to get more interesting. Our 'for' loop introduces a
 new variable to the symbol table. This means that our symbol table can
@@ -647,10 +686,11 @@ table.
 .. code-block:: c++
 
       // Emit the step value.
-      Value *StepVal;
+      Value *StepVal = nullptr;
       if (Step) {
-        StepVal = Step->Codegen();
-        if (StepVal == 0) return 0;
+        StepVal = Step->codegen();
+        if (!StepVal)
+          return nullptr;
       } else {
         // If not specified, use 1.0.
         StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
@@ -666,13 +706,13 @@ iteration of the loop.
 .. code-block:: c++
 
       // Compute the end condition.
-      Value *EndCond = End->Codegen();
-      if (EndCond == 0) return EndCond;
+      Value *EndCond = End->codegen();
+      if (!EndCond)
+        return nullptr;
 
       // Convert condition to a bool by comparing equal to 0.0.
-      EndCond = Builder.CreateFCmpONE(EndCond,
-                                  ConstantFP::get(getGlobalContext(), APFloat(0.0)),
-                                      "loopcond");
+      EndCond = Builder.CreateFCmpONE(
+          EndCond, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "loopcond");
 
 Finally, we evaluate the exit value of the loop, to determine whether
 the loop should exit. This mirrors the condition evaluation for the
@@ -682,7 +722,8 @@ if/then/else statement.
 
       // Create the "after loop" block and insert it.
       BasicBlock *LoopEndBB = Builder.GetInsertBlock();
-      BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
+      BasicBlock *AfterBB =
+          BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
 
       // Insert the conditional branch into the end of LoopEndBB.
       Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
@@ -718,7 +759,7 @@ value, we can add the incoming value to the loop PHI node. After that,
 we remove the loop variable from the symbol table, so that it isn't in
 scope after the for loop. Finally, code generation of the for loop
 always returns 0.0, so that is what we return from
-``ForExprAST::Codegen``.
+``ForExprAST::codegen()``.
 
 With this, we conclude the "adding control flow to Kaleidoscope" chapter
 of the tutorial. In this chapter we added two control flow constructs,
diff --git a/docs/tutorial/LangImpl6.rst b/docs/tutorial/LangImpl6.rst
index bf78bdea74d6..827cd392effb 100644
--- a/docs/tutorial/LangImpl6.rst
+++ b/docs/tutorial/LangImpl6.rst
@@ -24,7 +24,7 @@ is good or bad. In this tutorial we'll assume that it is okay to use
 this as a way to show some interesting parsing techniques.
 
 At the end of this tutorial, we'll run through an example Kaleidoscope
-application that `renders the Mandelbrot set <#example>`_. This gives an
+application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an
 example of what you can build with Kaleidoscope and its feature set.
 
 User-defined Operators: the Idea
@@ -96,19 +96,24 @@ keywords:
     enum Token {
       ...
       // operators
-      tok_binary = -11, tok_unary = -12
+      tok_binary = -11,
+      tok_unary = -12
     };
     ...
     static int gettok() {
     ...
-        if (IdentifierStr == "for") return tok_for;
-        if (IdentifierStr == "in") return tok_in;
-        if (IdentifierStr == "binary") return tok_binary;
-        if (IdentifierStr == "unary") return tok_unary;
+        if (IdentifierStr == "for")
+          return tok_for;
+        if (IdentifierStr == "in")
+          return tok_in;
+        if (IdentifierStr == "binary")
+          return tok_binary;
+        if (IdentifierStr == "unary")
+          return tok_unary;
         return tok_identifier;
 
 This just adds lexer support for the unary and binary keywords, like we
-did in `previous chapters <LangImpl5.html#iflexer>`_. One nice thing
+did in `previous chapters <LangImpl5.html#lexer-extensions-for-if-then-else>`_. One nice thing
 about our current AST, is that we represent binary operators with full
 generalisation by using their ASCII code as the opcode. For our extended
 operators, we'll use this same representation, so we don't need any new
@@ -129,15 +134,17 @@ this:
     class PrototypeAST {
       std::string Name;
       std::vector<std::string> Args;
-      bool isOperator;
+      bool IsOperator;
       unsigned Precedence;  // Precedence if a binary op.
+
     public:
-      PrototypeAST(const std::string &name, const std::vector<std::string> &args,
-                   bool isoperator = false, unsigned prec = 0)
-      : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
+      PrototypeAST(const std::string &name, std::vector<std::string> Args,
+                   bool IsOperator = false, unsigned Prec = 0)
+      : Name(name), Args(std::move(Args)), IsOperator(IsOperator),
+        Precedence(Prec) {}
 
-      bool isUnaryOp() const { return isOperator && Args.size() == 1; }
-      bool isBinaryOp() const { return isOperator && Args.size() == 2; }
+      bool isUnaryOp() const { return IsOperator && Args.size() == 1; }
+      bool isBinaryOp() const { return IsOperator && Args.size() == 2; }
 
       char getOperatorName() const {
         assert(isUnaryOp() || isBinaryOp());
@@ -146,7 +153,7 @@ this:
 
       unsigned getBinaryPrecedence() const { return Precedence; }
 
-      Function *Codegen();
+      Function *codegen();
     };
 
 Basically, in addition to knowing a name for the prototype, we now keep
@@ -161,7 +168,7 @@ user-defined operator, we need to parse it:
     /// prototype
     ///   ::= id '(' id* ')'
     ///   ::= binary LETTER number? (id, id)
-    static PrototypeAST *ParsePrototype() {
+    static std::unique_ptr<PrototypeAST> ParsePrototype() {
       std::string FnName;
 
       unsigned Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.
@@ -210,7 +217,8 @@ user-defined operator, we need to parse it:
       if (Kind && ArgNames.size() != Kind)
         return ErrorP("Invalid number of operands for operator");
 
-      return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
+      return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames), Kind != 0,
+                                             BinaryPrecedence);
     }
 
 This is all fairly straightforward parsing code, and we have already
@@ -227,26 +235,31 @@ default case for our existing binary operator node:
 
 .. code-block:: c++
 
-    Value *BinaryExprAST::Codegen() {
-      Value *L = LHS->Codegen();
-      Value *R = RHS->Codegen();
-      if (L == 0 || R == 0) return 0;
+    Value *BinaryExprAST::codegen() {
+      Value *L = LHS->codegen();
+      Value *R = RHS->codegen();
+      if (!L || !R)
+        return nullptr;
 
       switch (Op) {
-      case '+': return Builder.CreateFAdd(L, R, "addtmp");
-      case '-': return Builder.CreateFSub(L, R, "subtmp");
-      case '*': return Builder.CreateFMul(L, R, "multmp");
+      case '+':
+        return Builder.CreateFAdd(L, R, "addtmp");
+      case '-':
+        return Builder.CreateFSub(L, R, "subtmp");
+      case '*':
+        return Builder.CreateFMul(L, R, "multmp");
       case '<':
         L = Builder.CreateFCmpULT(L, R, "cmptmp");
         // Convert bool 0/1 to double 0.0 or 1.0
         return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
                                     "booltmp");
-      default: break;
+      default:
+        break;
       }
 
       // If it wasn't a builtin binary operator, it must be a user defined one. Emit
       // a call to it.
-      Function *F = TheModule->getFunction(std::string("binary")+Op);
+      Function *F = TheModule->getFunction(std::string("binary") + Op);
       assert(F && "binary operator not found!");
 
       Value *Ops[2] = { L, R };
@@ -263,12 +276,12 @@ The final piece of code we are missing, is a bit of top-level magic:
 
 .. code-block:: c++
 
-    Function *FunctionAST::Codegen() {
+    Function *FunctionAST::codegen() {
       NamedValues.clear();
 
-      Function *TheFunction = Proto->Codegen();
-      if (TheFunction == 0)
-        return 0;
+      Function *TheFunction = Proto->codegen();
+      if (!TheFunction)
+        return nullptr;
 
       // If this is an operator, install it.
       if (Proto->isBinaryOp())
@@ -278,7 +291,7 @@ The final piece of code we are missing, is a bit of top-level magic:
       BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
       Builder.SetInsertPoint(BB);
 
-      if (Value *RetVal = Body->Codegen()) {
+      if (Value *RetVal = Body->codegen()) {
         ...
 
 Basically, before codegening a function, if it is a user-defined
@@ -305,11 +318,12 @@ that, we need an AST node:
     /// UnaryExprAST - Expression class for a unary operator.
     class UnaryExprAST : public ExprAST {
       char Opcode;
-      ExprAST *Operand;
+      std::unique_ptr<ExprAST> Operand;
+
     public:
-      UnaryExprAST(char opcode, ExprAST *operand)
-        : Opcode(opcode), Operand(operand) {}
-      virtual Value *Codegen();
+      UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)
+        : Opcode(Opcode), Operand(std::move(Operand)) {}
+      virtual Value *codegen();
     };
 
 This AST node is very simple and obvious by now. It directly mirrors the
@@ -322,7 +336,7 @@ simple: we'll add a new function to do it:
     /// unary
     ///   ::= primary
     ///   ::= '!' unary
-    static ExprAST *ParseUnary() {
+    static std::unique_ptr<ExprAST> ParseUnary() {
       // If the current token is not an operator, it must be a primary expr.
       if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
         return ParsePrimary();
@@ -330,9 +344,9 @@ simple: we'll add a new function to do it:
       // If this is a unary operator, read it.
       int Opc = CurTok;
       getNextToken();
-      if (ExprAST *Operand = ParseUnary())
-        return new UnaryExprAST(Opc, Operand);
-      return 0;
+      if (auto Operand = ParseUnary())
+        return llvm::unique_ptr<UnaryExprAST>(Opc, std::move(Operand));
+      return nullptr;
     }
 
 The grammar we add is pretty straightforward here. If we see a unary
@@ -350,21 +364,24 @@ call ParseUnary instead:
 
     /// binoprhs
     ///   ::= ('+' unary)*
-    static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
+    static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
+                                                  std::unique_ptr<ExprAST> LHS) {
       ...
         // Parse the unary expression after the binary operator.
-        ExprAST *RHS = ParseUnary();
-        if (!RHS) return 0;
+        auto RHS = ParseUnary();
+        if (!RHS)
+          return nullptr;
       ...
     }
     /// expression
     ///   ::= unary binoprhs
     ///
-    static ExprAST *ParseExpression() {
-      ExprAST *LHS = ParseUnary();
-      if (!LHS) return 0;
+    static std::unique_ptr<ExprAST> ParseExpression() {
+      auto LHS = ParseUnary();
+      if (!LHS)
+        return nullptr;
 
-      return ParseBinOpRHS(0, LHS);
+      return ParseBinOpRHS(0, std::move(LHS));
     }
 
 With these two simple changes, we are now able to parse unary operators
@@ -378,7 +395,7 @@ operator code above with:
     ///   ::= id '(' id* ')'
     ///   ::= binary LETTER number? (id, id)
     ///   ::= unary LETTER (id)
-    static PrototypeAST *ParsePrototype() {
+    static std::unique_ptr<PrototypeAST> ParsePrototype() {
       std::string FnName;
 
       unsigned Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.
@@ -411,12 +428,13 @@ unary operators. It looks like this:
 
 .. code-block:: c++
 
-    Value *UnaryExprAST::Codegen() {
-      Value *OperandV = Operand->Codegen();
-      if (OperandV == 0) return 0;
+    Value *UnaryExprAST::codegen() {
+      Value *OperandV = Operand->codegen();
+      if (!OperandV)
+        return nullptr;
 
       Function *F = TheModule->getFunction(std::string("unary")+Opcode);
-      if (F == 0)
+      if (!F)
         return ErrorV("Unknown unary operator");
 
       return Builder.CreateCall(F, OperandV, "unop");
diff --git a/docs/tutorial/LangImpl7.rst b/docs/tutorial/LangImpl7.rst
index 648940785b09..1cd7d56fddb4 100644
--- a/docs/tutorial/LangImpl7.rst
+++ b/docs/tutorial/LangImpl7.rst
@@ -118,7 +118,7 @@ that @G defines *space* for an i32 in the global data area, but its
 *name* actually refers to the address for that space. Stack variables
 work the same way, except that instead of being declared with global
 variable definitions, they are declared with the `LLVM alloca
-instruction <../LangRef.html#i_alloca>`_:
+instruction <../LangRef.html#alloca-instruction>`_:
 
 .. code-block:: llvm
 
@@ -221,7 +221,7 @@ variables in certain circumstances:
    funny pointer arithmetic is involved, the alloca will not be
    promoted.
 #. mem2reg only works on allocas of `first
-   class <../LangRef.html#t_classifications>`_ values (such as pointers,
+   class <../LangRef.html#first-class-types>`_ values (such as pointers,
    scalars and vectors), and only if the array size of the allocation is
    1 (or missing in the .ll file). mem2reg is not capable of promoting
    structs or arrays to registers. Note that the "scalarrepl" pass is
@@ -355,10 +355,11 @@ from the stack slot:
 
 .. code-block:: c++
 
-    Value *VariableExprAST::Codegen() {
+    Value *VariableExprAST::codegen() {
       // Look this variable up in the function.
       Value *V = NamedValues[Name];
-      if (V == 0) return ErrorV("Unknown variable name");
+      if (!V)
+        return ErrorV("Unknown variable name");
 
       // Load the value.
       return Builder.CreateLoad(V, Name.c_str());
@@ -366,7 +367,7 @@ from the stack slot:
 
 As you can see, this is pretty straightforward. Now we need to update
 the things that define the variables to set up the alloca. We'll start
-with ``ForExprAST::Codegen`` (see the `full code listing <#code>`_ for
+with ``ForExprAST::codegen()`` (see the `full code listing <#id1>`_ for
 the unabridged code):
 
 .. code-block:: c++
@@ -377,16 +378,18 @@ the unabridged code):
       AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
 
         // Emit the start code first, without 'variable' in scope.
-      Value *StartVal = Start->Codegen();
-      if (StartVal == 0) return 0;
+      Value *StartVal = Start->codegen();
+      if (!StartVal)
+        return nullptr;
 
       // Store the value into the alloca.
       Builder.CreateStore(StartVal, Alloca);
       ...
 
       // Compute the end condition.
-      Value *EndCond = End->Codegen();
-      if (EndCond == 0) return EndCond;
+      Value *EndCond = End->codegen();
+      if (!EndCond)
+        return nullptr;
 
       // Reload, increment, and restore the alloca.  This handles the case where
       // the body of the loop mutates the variable.
@@ -396,7 +399,7 @@ the unabridged code):
       ...
 
 This code is virtually identical to the code `before we allowed mutable
-variables <LangImpl5.html#forcodegen>`_. The big difference is that we
+variables <LangImpl5.html#code-generation-for-the-for-loop>`_. The big difference is that we
 no longer have to construct a PHI node, and we use load/store to access
 the variable as needed.
 
@@ -423,7 +426,7 @@ them. The code for this is also pretty simple:
 
 For each argument, we make an alloca, store the input value to the
 function into the alloca, and register the alloca as the memory location
-for the argument. This method gets invoked by ``FunctionAST::Codegen``
+for the argument. This method gets invoked by ``FunctionAST::codegen()``
 right after it sets up the entry block for the function.
 
 The final missing piece is adding the mem2reg pass, which allows us to
@@ -569,11 +572,11 @@ implement codegen for the assignment operator. This looks like:
 
 .. code-block:: c++
 
-    Value *BinaryExprAST::Codegen() {
+    Value *BinaryExprAST::codegen() {
       // Special case '=' because we don't want to emit the LHS as an expression.
       if (Op == '=') {
         // Assignment requires the LHS to be an identifier.
-        VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS);
+        VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS.get());
         if (!LHSE)
           return ErrorV("destination of '=' must be a variable");
 
@@ -587,12 +590,14 @@ allowed.
 .. code-block:: c++
 
         // Codegen the RHS.
-        Value *Val = RHS->Codegen();
-        if (Val == 0) return 0;
+        Value *Val = RHS->codegen();
+        if (!Val)
+          return nullptr;
 
         // Look up the name.
         Value *Variable = NamedValues[LHSE->getName()];
-        if (Variable == 0) return ErrorV("Unknown variable name");
+        if (!Variable)
+          return ErrorV("Unknown variable name");
 
         Builder.CreateStore(Val, Variable);
         return Val;
@@ -649,10 +654,14 @@ this:
     ...
     static int gettok() {
     ...
-        if (IdentifierStr == "in") return tok_in;
-        if (IdentifierStr == "binary") return tok_binary;
-        if (IdentifierStr == "unary") return tok_unary;
-        if (IdentifierStr == "var") return tok_var;
+        if (IdentifierStr == "in")
+          return tok_in;
+        if (IdentifierStr == "binary")
+          return tok_binary;
+        if (IdentifierStr == "unary")
+          return tok_unary;
+        if (IdentifierStr == "var")
+          return tok_var;
         return tok_identifier;
     ...
 
@@ -663,14 +672,15 @@ var/in, it looks like this:
 
     /// VarExprAST - Expression class for var/in
     class VarExprAST : public ExprAST {
-      std::vector<std::pair<std::string, ExprAST*> > VarNames;
-      ExprAST *Body;
+      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
+      std::unique_ptr<ExprAST> Body;
+
     public:
-      VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
-                 ExprAST *body)
-      : VarNames(varnames), Body(body) {}
+      VarExprAST(std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames,
+                 std::unique_ptr<ExprAST> body)
+      : VarNames(std::move(VarNames)), Body(std::move(Body)) {}
 
-      virtual Value *Codegen();
+      virtual Value *codegen();
     };
 
 var/in allows a list of names to be defined all at once, and each name
@@ -690,15 +700,22 @@ do is add it as a primary expression:
     ///   ::= ifexpr
     ///   ::= forexpr
     ///   ::= varexpr
-    static ExprAST *ParsePrimary() {
+    static std::unique_ptr<ExprAST> ParsePrimary() {
       switch (CurTok) {
-      default: return Error("unknown token when expecting an expression");
-      case tok_identifier: return ParseIdentifierExpr();
-      case tok_number:     return ParseNumberExpr();
-      case '(':            return ParseParenExpr();
-      case tok_if:         return ParseIfExpr();
-      case tok_for:        return ParseForExpr();
-      case tok_var:        return ParseVarExpr();
+      default:
+        return Error("unknown token when expecting an expression");
+      case tok_identifier:
+        return ParseIdentifierExpr();
+      case tok_number:
+        return ParseNumberExpr();
+      case '(':
+        return ParseParenExpr();
+      case tok_if:
+        return ParseIfExpr();
+      case tok_for:
+        return ParseForExpr();
+      case tok_var:
+        return ParseVarExpr();
       }
     }
 
@@ -708,10 +725,10 @@ Next we define ParseVarExpr:
 
     /// varexpr ::= 'var' identifier ('=' expression)?
     //                    (',' identifier ('=' expression)?)* 'in' expression
-    static ExprAST *ParseVarExpr() {
+    static std::unique_ptr<ExprAST> ParseVarExpr() {
       getNextToken();  // eat the var.
 
-      std::vector<std::pair<std::string, ExprAST*> > VarNames;
+      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
 
       // At least one variable name is required.
       if (CurTok != tok_identifier)
@@ -727,15 +744,15 @@ into the local ``VarNames`` vector.
         getNextToken();  // eat identifier.
 
         // Read the optional initializer.
-        ExprAST *Init = 0;
+        std::unique_ptr<ExprAST> Init;
         if (CurTok == '=') {
           getNextToken(); // eat the '='.
 
           Init = ParseExpression();
-          if (Init == 0) return 0;
+          if (!Init) return nullptr;
         }
 
-        VarNames.push_back(std::make_pair(Name, Init));
+        VarNames.push_back(std::make_pair(Name, std::move(Init)));
 
         // End of var list, exit loop.
         if (CurTok != ',') break;
@@ -755,10 +772,12 @@ AST node:
         return Error("expected 'in' keyword after 'var'");
       getNextToken();  // eat 'in'.
 
-      ExprAST *Body = ParseExpression();
-      if (Body == 0) return 0;
+      auto Body = ParseExpression();
+      if (!Body)
+        return nullptr;
 
-      return new VarExprAST(VarNames, Body);
+      return llvm::make_unique<VarExprAST>(std::move(VarNames),
+                                           std::move(Body));
     }
 
 Now that we can parse and represent the code, we need to support
@@ -766,7 +785,7 @@ emission of LLVM IR for it. This code starts out with:
 
 .. code-block:: c++
 
-    Value *VarExprAST::Codegen() {
+    Value *VarExprAST::codegen() {
       std::vector<AllocaInst *> OldBindings;
 
       Function *TheFunction = Builder.GetInsertBlock()->getParent();
@@ -774,7 +793,7 @@ emission of LLVM IR for it. This code starts out with:
       // Register all variables and emit their initializer.
       for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
         const std::string &VarName = VarNames[i].first;
-        ExprAST *Init = VarNames[i].second;
+        ExprAST *Init = VarNames[i].second.get();
 
 Basically it loops over all the variables, installing them one at a
 time. For each variable we put into the symbol table, we remember the
@@ -789,8 +808,9 @@ previous value that we replace in OldBindings.
         //    var a = a in ...   # refers to outer 'a'.
         Value *InitVal;
         if (Init) {
-          InitVal = Init->Codegen();
-          if (InitVal == 0) return 0;
+          InitVal = Init->codegen();
+          if (!InitVal)
+            return nullptr;
         } else { // If not specified, use 0.0.
           InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
         }
@@ -814,8 +834,9 @@ we evaluate the body of the var/in expression:
 .. code-block:: c++
 
       // Codegen the body, now that all vars are in scope.
-      Value *BodyVal = Body->Codegen();
-      if (BodyVal == 0) return 0;
+      Value *BodyVal = Body->codegen();
+      if (!BodyVal)
+        return nullptr;
 
 Finally, before returning, we restore the previous variable bindings:
 
diff --git a/docs/tutorial/LangImpl8.rst b/docs/tutorial/LangImpl8.rst
index 0b9b39c84b75..3b0f443f08d5 100644
--- a/docs/tutorial/LangImpl8.rst
+++ b/docs/tutorial/LangImpl8.rst
@@ -75,8 +75,8 @@ statement be our "main":
 
 .. code-block:: udiff
 
-  -    PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
-  +    PrototypeAST *Proto = new PrototypeAST("main", std::vector<std::string>());
+  -    auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
+  +    auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>());
 
 just with the simple change of giving it a name.
 
@@ -108,19 +108,19 @@ code is that the llvm IR goes to standard error:
   @@ -1108,17 +1108,8 @@ static void HandleExtern() {
    static void HandleTopLevelExpression() {
      // Evaluate a top-level expression into an anonymous function.
-     if (FunctionAST *F = ParseTopLevelExpr()) {
-  -    if (Function *LF = F->Codegen()) {
+     if (auto FnAST = ParseTopLevelExpr()) {
+  -    if (auto *FnIR = FnAST->codegen()) {
   -      // We're just doing this to make sure it executes.
   -      TheExecutionEngine->finalizeObject();
   -      // JIT the function, returning a function pointer.
-  -      void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
+  -      void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR);
   -
   -      // Cast it to the right type (takes no arguments, returns a double) so we
   -      // can call it as a native function.
   -      double (*FP)() = (double (*)())(intptr_t)FPtr;
   -      // Ignore the return value for this.
   -      (void)FP;
-  +    if (!F->Codegen()) {
+  +    if (!F->codegen()) {
   +      fprintf(stderr, "Error generating code for top level expr");
        }
      } else {
@@ -165,13 +165,13 @@ DWARF Emission Setup
 ====================
 
 Similar to the ``IRBuilder`` class we have a
-```DIBuilder`` <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class
+`DIBuilder <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class
 that helps in constructing debug metadata for an llvm IR file. It
 corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names.
 Using it does require that you be more familiar with DWARF terminology than
 you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you
 read through the general documentation on the
-```Metadata Format`` <http://llvm.org/docs/SourceLevelDebugging.html>`_ it
+`Metadata Format <http://llvm.org/docs/SourceLevelDebugging.html>`_ it
 should be a little more clear. We'll be using this class to construct all
 of our IR level descriptions. Construction for it takes a module so we
 need to construct it shortly after we construct our module. We've left it
@@ -237,7 +237,7 @@ Functions
 =========
 
 Now that we have our ``Compile Unit`` and our source locations, we can add
-function definitions to the debug info. So in ``PrototypeAST::Codegen`` we
+function definitions to the debug info. So in ``PrototypeAST::codegen()`` we
 add a few lines of code to describe a context for our subprogram, in this
 case the "File", and the actual definition of the function itself.
 
@@ -261,7 +261,8 @@ information) and construct our function definition:
   DISubprogram *SP = DBuilder->createFunction(
       FContext, Name, StringRef(), Unit, LineNo,
       CreateFunctionType(Args.size(), Unit), false /* internal linkage */,
-      true /* definition */, ScopeLine, DINode::FlagPrototyped, false, F);
+      true /* definition */, ScopeLine, DINode::FlagPrototyped, false);
+  F->setSubprogram(SP);
 
 and we now have an DISubprogram that contains a reference to all of our
 metadata for the function.
@@ -307,10 +308,12 @@ and then we have added to all of our AST classes a source location:
      SourceLocation Loc;
 
      public:
+       ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
+       virtual ~ExprAST() {}
+       virtual Value* codegen() = 0;
        int getLine() const { return Loc.Line; }
        int getCol() const { return Loc.Col; }
-       ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
-       virtual std::ostream &dump(std::ostream &out, int ind) {
+       virtual raw_ostream &dump(raw_ostream &out, int ind) {
          return out << ':' << getLine() << ':' << getCol() << '\n';
        }
 
@@ -318,7 +321,8 @@ that we pass down through when we create a new expression:
 
 .. code-block:: c++
 
-   LHS = new BinaryExprAST(BinLoc, BinOp, LHS, RHS);
+   LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS),
+                                          std::move(RHS));
 
 giving us locations for each of our expressions and variables.
 
@@ -395,13 +399,12 @@ argument allocas in ``PrototypeAST::CreateArgumentAllocas``.
   DIScope *Scope = KSDbgInfo.LexicalBlocks.back();
   DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
                                       KSDbgInfo.TheCU.getDirectory());
-  DILocalVariable D = DBuilder->createLocalVariable(
-      dwarf::DW_TAG_arg_variable, Scope, Args[Idx], Unit, Line,
-      KSDbgInfo.getDoubleTy(), Idx);
+  DILocalVariable D = DBuilder->createParameterVariable(
+      Scope, Args[Idx], Idx + 1, Unit, Line, KSDbgInfo.getDoubleTy(), true);
 
-  Instruction *Call = DBuilder->insertDeclare(
-      Alloca, D, DBuilder->createExpression(), Builder.GetInsertBlock());
-  Call->setDebugLoc(DebugLoc::get(Line, 0, Scope));
+  DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),
+                          DebugLoc::get(Line, 0, Scope),
+                          Builder.GetInsertBlock());
 
 Here we're doing a few things. First, we're grabbing our current scope
 for the variable so we can say what range of code our variable is valid
@@ -409,7 +412,7 @@ through. Second, we're creating the variable, giving it the scope,
 the name, source location, type, and since it's an argument, the argument
 index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR
 level that we've got a variable in an alloca (and it gives a starting
-location for the variable). Lastly, we set a source location for the
+location for the variable), and setting a source location for the
 beginning of the scope on the declare.
 
 One interesting thing to note at this point is that various debuggers have
diff --git a/docs/tutorial/LangImpl9.rst b/docs/tutorial/LangImpl9.rst
index 6c43d53f90f9..f02bba857c14 100644
--- a/docs/tutorial/LangImpl9.rst
+++ b/docs/tutorial/LangImpl9.rst
@@ -49,7 +49,7 @@ For example, try adding:
    extending the type system in all sorts of interesting ways. Simple
    arrays are very easy and are quite useful for many different
    applications. Adding them is mostly an exercise in learning how the
-   LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction
+   LLVM `getelementptr <../LangRef.html#getelementptr-instruction>`_ instruction
    works: it is so nifty/unconventional, it `has its own
    FAQ <../GetElementPtr.html>`_! If you add support for recursive types
    (e.g. linked lists), make sure to read the `section in the LLVM
diff --git a/docs/tutorial/OCamlLangImpl1.rst b/docs/tutorial/OCamlLangImpl1.rst
index 94ca3a5aa4d3..cf968b5ae89c 100644
--- a/docs/tutorial/OCamlLangImpl1.rst
+++ b/docs/tutorial/OCamlLangImpl1.rst
@@ -139,7 +139,7 @@ useful for mutually recursive functions). For example:
 
 A more interesting example is included in Chapter 6 where we write a
 little Kaleidoscope application that `displays a Mandelbrot
-Set <OCamlLangImpl6.html#example>`_ at various levels of magnification.
+Set <OCamlLangImpl6.html#kicking-the-tires>`_ at various levels of magnification.
 
 Lets dive into the implementation of this language!
 
@@ -275,7 +275,7 @@ file. These are handled with this code:
       | [< >] -> [< >]
 
 With this, we have the complete lexer for the basic Kaleidoscope
-language (the `full code listing <OCamlLangImpl2.html#code>`_ for the
+language (the `full code listing <OCamlLangImpl2.html#full-code-listing>`_ for the
 Lexer is available in the `next chapter <OCamlLangImpl2.html>`_ of the
 tutorial). Next we'll `build a simple parser that uses this to build an
 Abstract Syntax Tree <OCamlLangImpl2.html>`_. When we have that, we'll
diff --git a/docs/tutorial/OCamlLangImpl2.rst b/docs/tutorial/OCamlLangImpl2.rst
index 905b306746f1..f5d6cd6822c9 100644
--- a/docs/tutorial/OCamlLangImpl2.rst
+++ b/docs/tutorial/OCamlLangImpl2.rst
@@ -130,7 +130,7 @@ We start with numeric literals, because they are the simplest to
 process. For each production in our grammar, we'll define a function
 which parses that production. We call this class of expressions
 "primary" expressions, for reasons that will become more clear `later in
-the tutorial <OCamlLangImpl6.html#unary>`_. In order to parse an
+the tutorial <OCamlLangImpl6.html#user-defined-unary-operators>`_. In order to parse an
 arbitrary primary expression, we need to determine what sort of
 expression it is. For numeric literals, we have:
 
@@ -280,7 +280,7 @@ fixed-size array).
 With the helper above defined, we can now start parsing binary
 expressions. The basic idea of operator precedence parsing is to break
 down an expression with potentially ambiguous binary operators into
-pieces. Consider ,for example, the expression "a+b+(c+d)\*e\*f+g".
+pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".
 Operator precedence parsing considers this as a stream of primary
 expressions separated by binary operators. As such, it will first parse
 the leading primary expression "a", then it will see the pairs [+, b]
@@ -505,7 +505,7 @@ The Driver
 
 The driver for this simply invokes all of the parsing pieces with a
 top-level dispatch loop. There isn't much interesting here, so I'll just
-include the top-level loop. See `below <#code>`_ for full code in the
+include the top-level loop. See `below <#full-code-listing>`_ for full code in the
 "Top-Level Parsing" section.
 
 .. code-block:: ocaml
diff --git a/docs/tutorial/OCamlLangImpl3.rst b/docs/tutorial/OCamlLangImpl3.rst
index 10d463b93ac3..a76b46d1bf6b 100644
--- a/docs/tutorial/OCamlLangImpl3.rst
+++ b/docs/tutorial/OCamlLangImpl3.rst
@@ -114,8 +114,8 @@ values that can be in the ``Codegen.named_values`` map are function
 arguments. This code simply checks to see that the specified name is in
 the map (if not, an unknown variable is being referenced) and returns
 the value for it. In future chapters, we'll add support for `loop
-induction variables <LangImpl5.html#for>`_ in the symbol table, and for
-`local variables <LangImpl7.html#localvars>`_.
+induction variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for
+`local variables <LangImpl7.html#user-defined-local-variables>`_.
 
 .. code-block:: ocaml
 
@@ -152,22 +152,22 @@ automatically provide each one with an increasing, unique numeric
 suffix. Local value names for instructions are purely optional, but it
 makes it much easier to read the IR dumps.
 
-`LLVM instructions <../LangRef.html#instref>`_ are constrained by strict
+`LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict
 rules: for example, the Left and Right operators of an `add
-instruction <../LangRef.html#i_add>`_ must have the same type, and the
+instruction <../LangRef.html#add-instruction>`_ must have the same type, and the
 result type of the add must match the operand types. Because all values
 in Kaleidoscope are doubles, this makes for very simple code for add,
 sub and mul.
 
 On the other hand, LLVM specifies that the `fcmp
-instruction <../LangRef.html#i_fcmp>`_ always returns an 'i1' value (a
+instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a
 one bit integer). The problem with this is that Kaleidoscope wants the
 value to be a 0.0 or 1.0 value. In order to get these semantics, we
 combine the fcmp instruction with a `uitofp
-instruction <../LangRef.html#i_uitofp>`_. This instruction converts its
+instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its
 input integer into a floating point value by treating the input as an
 unsigned value. In contrast, if we used the `sitofp
-instruction <../LangRef.html#i_sitofp>`_, the Kaleidoscope '<' operator
+instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator
 would return 0.0 and -1.0, depending on the input value.
 
 .. code-block:: ocaml
@@ -196,7 +196,7 @@ to resolve function names for us.
 
 Once we have the function to call, we recursively codegen each argument
 that is to be passed in, and create an LLVM `call
-instruction <../LangRef.html#i_call>`_. Note that LLVM uses the native C
+instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C
 calling conventions by default, allowing these calls to also call into
 standard library functions like "sin" and "cos", with no additional
 effort.
@@ -253,7 +253,7 @@ The final line above checks if the function has already been defined in
 This indicates the type and name to use, as well as which module to
 insert into. By default we assume a function has
 ``Llvm.Linkage.ExternalLinkage``. "`external
-linkage <LangRef.html#linkage>`_" means that the function may be defined
+linkage <../LangRef.html#linkage>`_" means that the function may be defined
 outside the current module and/or that it is callable by functions
 outside the module. The "``name``" passed in is the name the user
 specified: this name is registered in "``Codegen.the_module``"s symbol
@@ -360,7 +360,7 @@ Once the insertion point is set up, we call the ``Codegen.codegen_func``
 method for the root expression of the function. If no error happens,
 this emits code to compute the expression into the entry block and
 returns the value that was computed. Assuming no error, we then create
-an LLVM `ret instruction <../LangRef.html#i_ret>`_, which completes the
+an LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the
 function. Once the function is built, we call
 ``Llvm_analysis.assert_valid_function``, which is provided by LLVM. This
 function does a variety of consistency checks on the generated code, to
@@ -413,10 +413,10 @@ For example:
 
 Note how the parser turns the top-level expression into anonymous
 functions for us. This will be handy when we add `JIT
-support <OCamlLangImpl4.html#jit>`_ in the next chapter. Also note that
+support <OCamlLangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that
 the code is very literally transcribed, no optimizations are being
 performed. We will `add
-optimizations <OCamlLangImpl4.html#trivialconstfold>`_ explicitly in the
+optimizations <OCamlLangImpl4.html#trivial-constant-folding>`_ explicitly in the
 next chapter.
 
 ::
diff --git a/docs/tutorial/OCamlLangImpl4.rst b/docs/tutorial/OCamlLangImpl4.rst
index b13b2afa8883..feeba01be24b 100644
--- a/docs/tutorial/OCamlLangImpl4.rst
+++ b/docs/tutorial/OCamlLangImpl4.rst
@@ -130,7 +130,7 @@ exactly the code we have now, except that we would defer running the
 optimizer until the entire file has been parsed.
 
 In order to get per-function optimizations going, we need to set up a
-`Llvm.PassManager <../WritingAnLLVMPass.html#passmanager>`_ to hold and
+`Llvm.PassManager <../WritingAnLLVMPass.html#what-passmanager-does>`_ to hold and
 organize the LLVM optimizations that we want to run. Once we have that,
 we can add a set of optimizations to run. The code looks like this:
 
diff --git a/docs/tutorial/OCamlLangImpl5.rst b/docs/tutorial/OCamlLangImpl5.rst
index 0faecfb9222e..675b9bc1978b 100644
--- a/docs/tutorial/OCamlLangImpl5.rst
+++ b/docs/tutorial/OCamlLangImpl5.rst
@@ -175,7 +175,7 @@ Kaleidoscope looks like this:
 To visualize the control flow graph, you can use a nifty feature of the
 LLVM '`opt <http://llvm.org/cmds/opt.html>`_' tool. If you put this LLVM
 IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a
-window will pop up <../ProgrammersManual.html#ViewGraph>`_ and you'll
+window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll
 see this graph:
 
 .. figure:: LangImpl5-cfg.png
diff --git a/docs/tutorial/OCamlLangImpl6.rst b/docs/tutorial/OCamlLangImpl6.rst
index 36bffa8e9696..a3ae11fd7e54 100644
--- a/docs/tutorial/OCamlLangImpl6.rst
+++ b/docs/tutorial/OCamlLangImpl6.rst
@@ -24,7 +24,7 @@ is good or bad. In this tutorial we'll assume that it is okay to use
 this as a way to show some interesting parsing techniques.
 
 At the end of this tutorial, we'll run through an example Kaleidoscope
-application that `renders the Mandelbrot set <#example>`_. This gives an
+application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an
 example of what you can build with Kaleidoscope and its feature set.
 
 User-defined Operators: the Idea
@@ -108,7 +108,7 @@ keywords:
           | "unary" -> [< 'Token.Unary; stream >]
 
 This just adds lexer support for the unary and binary keywords, like we
-did in `previous chapters <OCamlLangImpl5.html#iflexer>`_. One nice
+did in `previous chapters <OCamlLangImpl5.html#lexer-extensions-for-if-then-else>`_. One nice
 thing about our current AST, is that we represent binary operators with
 full generalisation by using their ASCII code as the opcode. For our
 extended operators, we'll use this same representation, so we don't need
diff --git a/docs/tutorial/OCamlLangImpl7.rst b/docs/tutorial/OCamlLangImpl7.rst
index 98ea93f42f3f..c8c701b91012 100644
--- a/docs/tutorial/OCamlLangImpl7.rst
+++ b/docs/tutorial/OCamlLangImpl7.rst
@@ -118,7 +118,7 @@ that @G defines *space* for an i32 in the global data area, but its
 *name* actually refers to the address for that space. Stack variables
 work the same way, except that instead of being declared with global
 variable definitions, they are declared with the `LLVM alloca
-instruction <../LangRef.html#i_alloca>`_:
+instruction <../LangRef.html#alloca-instruction>`_:
 
 .. code-block:: llvm
 
@@ -221,7 +221,7 @@ variables in certain circumstances:
    funny pointer arithmetic is involved, the alloca will not be
    promoted.
 #. mem2reg only works on allocas of `first
-   class <../LangRef.html#t_classifications>`_ values (such as pointers,
+   class <../LangRef.html#first-class-types>`_ values (such as pointers,
    scalars and vectors), and only if the array size of the allocation is
    1 (or missing in the .ll file). mem2reg is not capable of promoting
    structs or arrays to registers. Note that the "scalarrepl" pass is
@@ -367,7 +367,7 @@ from the stack slot:
 
 As you can see, this is pretty straightforward. Now we need to update
 the things that define the variables to set up the alloca. We'll start
-with ``codegen_expr Ast.For ...`` (see the `full code listing <#code>`_
+with ``codegen_expr Ast.For ...`` (see the `full code listing <#id1>`_
 for the unabridged code):
 
 .. code-block:: ocaml
@@ -407,7 +407,7 @@ for the unabridged code):
           ...
 
 This code is virtually identical to the code `before we allowed mutable
-variables <OCamlLangImpl5.html#forcodegen>`_. The big difference is that
+variables <OCamlLangImpl5.html#code-generation-for-the-for-loop>`_. The big difference is that
 we no longer have to construct a PHI node, and we use load/store to
 access the variable as needed.
 
diff --git a/docs/tutorial/OCamlLangImpl8.rst b/docs/tutorial/OCamlLangImpl8.rst
index 0346fa9fed14..3ab6db35dfb0 100644
--- a/docs/tutorial/OCamlLangImpl8.rst
+++ b/docs/tutorial/OCamlLangImpl8.rst
@@ -48,7 +48,7 @@ For example, try adding:
    extending the type system in all sorts of interesting ways. Simple
    arrays are very easy and are quite useful for many different
    applications. Adding them is mostly an exercise in learning how the
-   LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction
+   LLVM `getelementptr <../LangRef.html#getelementptr-instruction>`_ instruction
    works: it is so nifty/unconventional, it `has its own
    FAQ <../GetElementPtr.html>`_! If you add support for recursive types
    (e.g. linked lists), make sure to read the `section in the LLVM
diff --git a/docs/yaml2obj.rst b/docs/yaml2obj.rst
index 1812e58914ae..d18ce02a336c 100644
--- a/docs/yaml2obj.rst
+++ b/docs/yaml2obj.rst
@@ -65,6 +65,7 @@ Here's a simplified Kwalify_ schema with an extension to allow alternate types.
                                  , IMAGE_FILE_MACHINE_AMD64
                                  , IMAGE_FILE_MACHINE_ARM
                                  , IMAGE_FILE_MACHINE_ARMNT
+                                 , IMAGE_FILE_MACHINE_ARM64
                                  , IMAGE_FILE_MACHINE_EBC
                                  , IMAGE_FILE_MACHINE_I386
                                  , IMAGE_FILE_MACHINE_IA64