50 files changed, 4055 insertions, 1497 deletions
diff --git a/docs/Atomics.rst b/docs/Atomics.rst
index 5f17c6154f3c..6c8303b2830d 100644
--- a/docs/Atomics.rst
+++ b/docs/Atomics.rst
@@ -18,16 +18,16 @@ clarified in the IR.
 The atomic instructions are designed specifically to provide readable IR and
 optimized code generation for the following:
 
-* The new C++0x ``<atomic>`` header.  (`C++0x draft available here
-  <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C1x draft available here
+* The new C++11 ``<atomic>`` header.  (`C++11 draft available here
+  <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
   <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
 
 * Proper semantics for Java-style memory, for both ``volatile`` and regular
   shared variables. (`Java Specification
-  <http://java.sun.com/docs/books/jls/third_edition/html/memory.html>`_)
+  <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_)
 
 * gcc-compatible ``__sync_*`` builtins. (`Description
-  <http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html>`_)
+  <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_)
 
 * Other scenarios with atomic semantics, including ``static`` variables with
   non-trivial constructors in C++.
@@ -115,7 +115,10 @@ memory operation can happen on any thread between the load and store.
 A ``fence`` provides Acquire and/or Release ordering which is not part of
 another operation; it is normally used along with Monotonic memory operations.
 A Monotonic load followed by an Acquire fence is roughly equivalent to an
-Acquire load.
+Acquire load, and a Monotonic store following a Release fence is roughly
+equivalent to a Release store. SequentiallyConsistent fences behave as both
+an Acquire and a Release fence, and offer some additional complicated
+guarantees, see the C++11 standard for details.
 
 Frontends generating atomic instructions generally need to be aware of the
 target to some degree; atomic instructions are guaranteed to be lock-free, and
@@ -177,10 +180,10 @@ Unordered
 
 Unordered is the lowest level of atomicity. It essentially guarantees that races
 produce somewhat sane results instead of having undefined behavior.  It also
-guarantees the operation to be lock-free, so it do not depend on the data being
-part of a special atomic structure or depend on a separate per-process global
-lock.  Note that code generation will fail for unsupported atomic operations; if
-you need such an operation, use explicit locking.
+guarantees the operation to be lock-free, so it does not depend on the data
+being part of a special atomic structure or depend on a separate per-process
+global lock.  Note that code generation will fail for unsupported atomic
+operations; if you need such an operation, use explicit locking.
 
 Relevant standard
   This is intended to match the Java memory model for shared variables.
@@ -221,7 +224,7 @@ essentially guarantees that if you take all the operations affecting a specific
 address, a consistent ordering exists.
 
 Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_relaxed``; see those
+  This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
   standards for the exact definition.
 
 Notes for frontends
@@ -251,8 +254,8 @@ Acquire provides a barrier of the sort necessary to acquire a lock to access
 other memory with normal loads and stores.
 
 Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_acquire``. It should also be
-  used for C++0x/C1x ``memory_order_consume``.
+  This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
+  used for C++11/C11 ``memory_order_consume``.
 
 Notes for frontends
   If you are writing a frontend which uses this directly, use with caution.
@@ -281,7 +284,7 @@ Release is similar to Acquire, but with a barrier of the sort necessary to
 release a lock.
 
 Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_release``.
+  This corresponds to the C++11/C11 ``memory_order_release``.
 
 Notes for frontends
   If you are writing a frontend which uses this directly, use with caution.
@@ -307,7 +310,7 @@ AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
 barrier (for fences and operations which both read and write memory).
 
 Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
+  This corresponds to the C++11/C11 ``memory_order_acq_rel``.
 
 Notes for frontends
   If you are writing a frontend which uses this directly, use with caution.
@@ -330,7 +333,7 @@ and Release semantics for stores. Additionally, it guarantees that a total
 ordering exists between all SequentiallyConsistent operations.
 
 Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_seq_cst``, Java volatile, and
+  This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
   the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
 
 Notes for frontends
@@ -368,6 +371,11 @@ Predicates for optimizer writers to query:
   that they return true for any operation which is volatile or at least
   Monotonic.
 
+* ``isAtLeastAcquire()``/``isAtLeastRelease()``: These are predicates on
+  orderings. They can be useful for passes that are aware of atomics, for
+  example to do DSE across a single atomic access, but not across a
+  release-acquire pair (see MemoryDependencyAnalysis for an example of this)
+
 * Alias analysis: Note that AA will return ModRef for anything Acquire or
   Release, and for the address accessed by any Monotonic operation.
 
@@ -389,7 +397,9 @@ operations:
 
 * DSE: Unordered stores can be DSE'ed like normal stores.  Monotonic stores can
   be DSE'ed in some cases, but it's tricky to reason about, and not especially
-  important.
+  important. It is possible in some case for DSE to operate across a stronger
+  atomic operation, but it is fairly tricky. DSE delegates this reasoning to
+  MemoryDependencyAnalysis (which is also used by other passes like GVN).
 
 * Folding a load: Any atomic load from a constant global can be constant-folded,
   because it cannot be observed.  Similar reasoning allows scalarrepl with
@@ -400,7 +410,8 @@ Atomics and Codegen
 
 Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
 On architectures which use barrier instructions for all atomic ordering (like
-ARM), appropriate fences are split out as the DAG is built.
+ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
+``setInsertFencesForAtomic()`` was used.
 
 The MachineMemOperand for all atomic operations is currently marked as volatile;
 this is not correct in the IR sense of volatile, but CodeGen handles anything
@@ -415,11 +426,6 @@ error when given an operation which cannot be implemented.  (The LLVM code
 generator is not very helpful here at the moment, but hopefully that will
 change.)
 
-The implementation of atomics on LL/SC architectures (like ARM) is currently a
-bit of a mess; there is a lot of copy-pasted code across targets, and the
-representation is relatively unsuited to optimization (it would be nice to be
-able to optimize loops involving cmpxchg etc.).
-
 On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
 generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
 fences generate an ``MFENCE``, other fences do not cause any code to be
@@ -435,3 +441,19 @@ operation. Loads and stores generate normal instructions.  ``cmpxchg`` and
 ``atomicrmw`` can be represented using a loop with LL/SC-style instructions
 which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
 on ARM, etc.).
+
+It is often easiest for backends to use AtomicExpandPass to lower some of the
+atomic constructs. Here are some lowerings it can do:
+
+* cmpxchg -> loop with load-linked/store-conditional
+  by overriding ``hasLoadLinkedStoreConditional()``, ``emitLoadLinked()``,
+  ``emitStoreConditional()``
+* large loads/stores -> ll-sc/cmpxchg
+  by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
+* strong atomic accesses -> monotonic accesses + fences
+  by using ``setInsertFencesForAtomic()`` and overriding ``emitLeadingFence()``
+  and ``emitTrailingFence()``
+* atomic rmw -> loop with cmpxchg or load-linked/store-conditional
+  by overriding ``expandAtomicRMWInIR()``
+
+For an example of all of these, look at the ARM backend.
diff --git a/docs/BitCodeFormat.rst b/docs/BitCodeFormat.rst
index fce1e37cf513..fc553f79174a 100644
--- a/docs/BitCodeFormat.rst
+++ b/docs/BitCodeFormat.rst
@@ -28,8 +28,9 @@ Unlike XML, the bitstream format is a binary encoding, and unlike XML it
 provides a mechanism for the file to self-describe "abbreviations", which are
 effectively size optimizations for the content.
 
-LLVM IR files may be optionally embedded into a `wrapper`_ structure that makes
-it easy to embed extra data along with LLVM IR files.
+LLVM IR files may be optionally embedded into a `wrapper`_ structure, or in a
+`native object file`_. Both of these mechanisms make it easy to embed extra
+data along with LLVM IR files.
 
 This document first describes the LLVM bitstream format, describes the wrapper
 format, then describes the record structure used by LLVM IR files.
@@ -460,6 +461,19 @@ to the start of the bitcode stream in the file, and the Size field is the size
 in bytes of the stream. CPUType is a target-specific value that can be used to
 encode the CPU of the target.
 
+.. _native object file:
+
+Native Object File Wrapper Format
+=================================
+
+Bitcode files for LLVM IR may also be wrapped in a native object file
+(i.e. ELF, COFF, Mach-O).  The bitcode must be stored in a section of the
+object file named ``.llvmbc``.  This wrapper format is useful for accommodating
+LTO in compilation pipelines where intermediate objects must be native object
+files which contain metadata in other sections.
+
+Not all tools support this format.
+
 .. _encoding of LLVM IR:
 
 LLVM IR Encoding
@@ -714,7 +728,7 @@ global variable. The operand fields are:
 * *unnamed_addr*: If present and non-zero, indicates that the variable has
   ``unnamed_addr``
 
-.. _dllstorageclass:
+.. _bcdllstorageclass:
 
 * *dllstorageclass*: If present, an encoding of the DLL storage class of this variable:
 
@@ -727,7 +741,7 @@ global variable. The operand fields are:
 MODULE_CODE_FUNCTION Record
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prefix, dllstorageclass]``
+``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prologuedata, dllstorageclass, comdat, prefixdata]``
 
 The ``FUNCTION`` record (code 8) marks the declaration or definition of a
 function. The operand fields are:
@@ -770,10 +784,17 @@ function. The operand fields are:
 * *unnamed_addr*: If present and non-zero, indicates that the function has
   ``unnamed_addr``
 
-* *prefix*: If non-zero, the value index of the prefix data for this function,
+* *prologuedata*: If non-zero, the value index of the prologue data for this function,
+  plus 1.
+
+* *dllstorageclass*: An encoding of the
+  :ref:`dllstorageclass<bcdllstorageclass>` of this function
+
+* *comdat*: An encoding of the COMDAT of this function
+
+* *prefixdata*: If non-zero, the value index of the prefix data for this function,
   plus 1.
 
-* *dllstorageclass*: An encoding of the `dllstorageclass`_ of this function
 
 MODULE_CODE_ALIAS Record
 ^^^^^^^^^^^^^^^^^^^^^^^^
@@ -791,7 +812,8 @@ fields are
 
 * *visibility*: If present, an encoding of the `visibility`_ of the alias
 
-* *dllstorageclass*: If present, an encoding of the `dllstorageclass`_ of the alias
+* *dllstorageclass*: If present, an encoding of the
+  :ref:`dllstorageclass<bcdllstorageclass>` of the alias
 
 MODULE_CODE_PURGEVALS Record
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/docs/CMake.rst b/docs/CMake.rst
index 957c5d4bb1c5..47cb2f3f04fd 100644
--- a/docs/CMake.rst
+++ b/docs/CMake.rst
@@ -234,7 +234,7 @@ LLVM-specific variables
   Enable all compiler warnings. Defaults to ON.
 
 **LLVM_ENABLE_PEDANTIC**:BOOL
-  Enable pedantic mode. This disable compiler specific extensions, is
+  Enable pedantic mode. This disables compiler specific extensions, if
   possible. Defaults to ON.
 
 **LLVM_ENABLE_WERROR**:BOOL
@@ -288,8 +288,14 @@ LLVM-specific variables
 
 **LLVM_USE_SANITIZER**:STRING
   Define the sanitizer used to build LLVM binaries and tests. Possible values
-  are ``Address``, ``Memory`` and ``MemoryWithOrigins``. Defaults to empty
-  string.
+  are ``Address``, ``Memory``, ``MemoryWithOrigins`` and ``Undefined``.
+  Defaults to empty string.
+
+**LLVM_PARALLEL_COMPILE_JOBS**:STRING
+  Define the maximum number of concurrent compilation jobs.
+
+**LLVM_PARALLEL_LINK_JOBS**:STRING
+  Define the maximum number of concurrent link jobs.
 
 **LLVM_BUILD_DOCS**:BOOL
   Enables all enabled documentation targets (i.e. Doxgyen and Sphinx targets) to
@@ -363,6 +369,10 @@ LLVM-specific variables
   is enabled). Currently the only target added is ``docs-llvm-man``. Defaults
   to ON.
 
+**SPHINX_WARNINGS_AS_ERRORS**:BOOL
+  If enabled then sphinx documentation warnings will be treated as
+  errors. Defaults to ON.
+
 Executing the test suite
 ========================
 
@@ -389,8 +399,6 @@ for a quick solution.
 Also see the `LLVM-specific variables`_ section for variables used when
 cross-compiling.
 
-.. _Embedding LLVM in your project:
-
 Embedding LLVM in your project
 ==============================
 
diff --git a/docs/CMakeLists.txt b/docs/CMakeLists.txt
index d310a0a79105..da27627f07e9 100644
--- a/docs/CMakeLists.txt
+++ b/docs/CMakeLists.txt
@@ -104,3 +104,46 @@ if (LLVM_ENABLE_SPHINX)
 
   endif()
 endif()
+
+list(FIND LLVM_BINDINGS_LIST ocaml uses_ocaml)
+if( NOT uses_ocaml LESS 0 )
+  set(doc_targets
+        ocaml_llvm
+        ocaml_llvm_all_backends
+        ocaml_llvm_analysis
+        ocaml_llvm_bitreader
+        ocaml_llvm_bitwriter
+        ocaml_llvm_executionengine
+        ocaml_llvm_irreader
+        ocaml_llvm_linker
+        ocaml_llvm_target
+        ocaml_llvm_ipo
+        ocaml_llvm_passmgr_builder
+        ocaml_llvm_scalar_opts
+        ocaml_llvm_transform_utils
+        ocaml_llvm_vectorize
+      )
+
+  foreach(llvm_target ${LLVM_TARGETS_TO_BUILD})
+    list(APPEND doc_targets ocaml_llvm_${llvm_target})
+  endforeach()
+
+  set(odoc_files)
+  foreach( doc_target ${doc_targets} )
+    get_target_property(odoc_file ${doc_target} OCAML_ODOC)
+    list(APPEND odoc_files -load ${odoc_file})
+  endforeach()
+
+  add_custom_target(ocaml_doc
+    COMMAND ${CMAKE_COMMAND} -E remove_directory ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
+    COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
+    COMMAND ${OCAMLFIND} ocamldoc -d ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
+                                  -sort -colorize-code -html ${odoc_files})
+
+  add_dependencies(ocaml_doc ${doc_targets})
+
+  if (NOT LLVM_INSTALL_TOOLCHAIN_ONLY)
+    install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
+      DESTINATION docs/ocaml/html)
+  endif()
+endif()
diff --git a/docs/CodeGenerator.rst b/docs/CodeGenerator.rst
index 5736e4378d72..7e5b6eb76392 100644
--- a/docs/CodeGenerator.rst
+++ b/docs/CodeGenerator.rst
@@ -290,10 +290,10 @@ the opcode, the number of operands, the list of implicit register uses and defs,
 whether the instruction has certain target-independent properties (accesses
 memory, is commutable, etc), and holds any target-specific flags.
 
-The ``TargetFrameInfo`` class
------------------------------
+The ``TargetFrameLowering`` class
+---------------------------------
 
-The ``TargetFrameInfo`` class is used to provide information about the stack
+The ``TargetFrameLowering`` class is used to provide information about the stack
 frame layout of the target. It holds the direction of stack growth, the known
 stack alignment on entry to each function, and the offset to the local area.
 The offset to the local area is the offset from the stack pointer on function
@@ -464,7 +464,7 @@ code:
   mov %EAX, %EDX
   sar %EDX, 31
   idiv %ECX
-  ret 
+  ret
 
 This approach is extremely general (if it can handle the X86 architecture, it
 can handle anything!) and allows all of the target specific knowledge about the
@@ -769,7 +769,9 @@ provide an ordering between nodes that have side effects (such as loads, stores,
 calls, returns, etc).  All nodes that have side effects should take a token
 chain as input and produce a new one as output.  By convention, token chain
 inputs are always operand #0, and chain results are always the last value
-produced by an operation.
+produced by an operation. However, after instruction selection, the
+machine nodes have their chain after the instruction's operands, and
+may be followed by glue nodes.
 
 A SelectionDAG has designated "Entry" and "Root" nodes.  The Entry node is
 always a marker node with an Opcode of ``ISD::EntryToken``.  The Root node is
@@ -846,6 +848,10 @@ is based on the final SelectionDAG, with nodes that must be scheduled together
 bundled into a single scheduling-unit node, and with immediate operands and
 other nodes that aren't relevant for scheduling omitted.
 
+The option ``-filter-view-dags`` allows to select the name of the basic block
+that you are interested to visualize and filters all the previous
+``view-*-dags`` options.
+
 .. _Build initial DAG:
 
 Initial SelectionDAG Construction
diff --git a/docs/CodingStandards.rst b/docs/CodingStandards.rst
index 3cfa1f66ab4e..0552c7117e2a 100644
--- a/docs/CodingStandards.rst
+++ b/docs/CodingStandards.rst
@@ -162,6 +162,8 @@ being aware of:
 * ``std::initializer_list`` (and the constructors and functions that take it as
   an argument) are not always available, so you cannot (for example) initialize
   a ``std::vector`` with a braced initializer list.
+* ``std::equal()`` (and other algorithms) incorrectly assert in MSVC when given
+  ``nullptr`` as an iterator.
 
 Other than these areas you should assume the standard library is available and
 working as expected until some build bot tells you otherwise. If you're in an
@@ -174,6 +176,25 @@ traits header to emulate it.
 .. _the libstdc++ manual:
   http://gcc.gnu.org/onlinedocs/gcc-4.7.3/libstdc++/manual/manual/status.html#status.iso.2011
 
+Other Languages
+---------------
+
+Any code written in the Go programming language is not subject to the
+formatting rules below. Instead, we adopt the formatting rules enforced by
+the `gofmt`_ tool.
+
+Go code should strive to be idiomatic. Two good sets of guidelines for what
+this means are `Effective Go`_ and `Go Code Review Comments`_.
+
+.. _gofmt:
+  https://golang.org/cmd/gofmt/
+
+.. _Effective Go:
+  https://golang.org/doc/effective_go.html
+
+.. _Go Code Review Comments:
+  https://code.google.com/p/go-wiki/wiki/CodeReviewComments
+
 Mechanical Source Issues
 ========================
 
diff --git a/docs/CommandGuide/lit.rst b/docs/CommandGuide/lit.rst
index 4d84be63dff2..2708e9de0740 100644
--- a/docs/CommandGuide/lit.rst
+++ b/docs/CommandGuide/lit.rst
@@ -84,6 +84,14 @@ OUTPUT OPTIONS
 
  Do not use curses based progress bar.
 
+.. option:: --show-unsupported
+
+ Show the names of unsupported tests.
+
+.. option:: --show-xfail
+
+ Show the names of tests that were expected to fail.
+
 .. _execution-options:
 
 EXECUTION OPTIONS
@@ -262,7 +270,7 @@ Once a test suite is discovered, its config file is loaded.  Config files
 themselves are Python modules which will be executed.  When the config file is
 executed, two important global variables are predefined:
 
-**lit**
+**lit_config**
 
  The global **lit** configuration object (a *LitConfig* instance), which defines
  the builtin test formats, global configuration parameters, and other helper
@@ -307,14 +315,6 @@ executed, two important global variables are predefined:
  **root** The root configuration.  This is the top-most :program:`lit` configuration in
  the project.
 
- **on_clone** The config is actually cloned for every subdirectory inside a test
- suite, to allow local configuration on a per-directory basis.  The *on_clone*
- variable can be set to a Python function which will be called whenever a
- configuration is cloned (for a subdirectory).  The function should takes three
- arguments: (1) the parent configuration, (2) the new configuration (which the
- *on_clone* function will generally modify), and (3) the test path to the new
- directory being scanned.
-
  **pipefail** Normally a test using a shell pipe fails if any of the commands
  on the pipe fail. If this is not desired, setting this variable to false
  makes the test fail only if the last command in the pipe fails.
diff --git a/docs/CommandGuide/llvm-config.rst b/docs/CommandGuide/llvm-config.rst
index 0ebb344c06ad..34075d0b3086 100644
--- a/docs/CommandGuide/llvm-config.rst
+++ b/docs/CommandGuide/llvm-config.rst
@@ -151,7 +151,7 @@ libraries.  Useful "virtual" components include:
 
 **all**
 
- Includes all LLVM libaries.  The default if no components are specified.
+ Includes all LLVM libraries.  The default if no components are specified.
 
 
 
diff --git a/docs/CommandGuide/llvm-symbolizer.rst b/docs/CommandGuide/llvm-symbolizer.rst
index ce2d9c004353..96720e633f2f 100644
--- a/docs/CommandGuide/llvm-symbolizer.rst
+++ b/docs/CommandGuide/llvm-symbolizer.rst
@@ -92,6 +92,13 @@ OPTIONS
  input (see example above). If architecture is not specified in either way,
  address will not be symbolized. Defaults to empty string.
 
+.. option:: -dsym-hint=<path/to/file.dSYM>
+
+ (Darwin-only flag). If the debug info for a binary isn't present in the default
+ location, look for the debug info at the .dSYM path provided via the
+ ``-dsym-hint`` flag. This flag can be used multiple times.
+
+
 EXIT STATUS
 -----------
 
diff --git a/docs/CommandGuide/opt.rst b/docs/CommandGuide/opt.rst
index ad5b62cf6e53..3a050f7d8154 100644
--- a/docs/CommandGuide/opt.rst
+++ b/docs/CommandGuide/opt.rst
@@ -62,27 +62,14 @@ OPTIONS
  available.  The order in which the options occur on the command line are the
  order in which they are executed (within pass constraints).
 
-.. option:: -std-compile-opts
-
- This is short hand for a standard list of *compile time optimization* passes.
- It might be useful for other front end compilers as well.  To discover the
- full set of options available, use the following command:
-
- .. code-block:: sh
-
-     llvm-as < /dev/null | opt -std-compile-opts -disable-output -debug-pass=Arguments
-
 .. option:: -disable-inlining
 
- This option is only meaningful when :option:`-std-compile-opts` is given.  It
- simply removes the inlining pass from the standard list.
+ This option simply removes the inlining pass from the standard list.
 
 .. option:: -disable-opt
 
- This option is only meaningful when :option:`-std-compile-opts` is given.  It
- disables most, but not all, of the :option:`-std-compile-opts`.  The ones that
- remain are :option:`-verify`, :option:`-lower-setjmp`, and
- :option:`-funcresolve`.
+ This option is only meaningful when :option:`-std-link-opts` is given.  It
+ disables most passes.
 
 .. option:: -strip-debug
 
@@ -95,9 +82,7 @@ OPTIONS
  This option causes opt to add a verify pass after every pass otherwise
  specified on the command line (including :option:`-verify`).  This is useful
  for cases where it is suspected that a pass is creating an invalid module but
- it is not clear which pass is doing it.  The combination of
- :option:`-std-compile-opts` and :option:`-verify-each` can quickly track down
- this kind of problem.
+ it is not clear which pass is doing it.
 
 .. option:: -stats
 
diff --git a/docs/CommandLine.rst b/docs/CommandLine.rst
index 1b342e34bf50..1d85215f2af3 100644
--- a/docs/CommandLine.rst
+++ b/docs/CommandLine.rst
@@ -1630,13 +1630,13 @@ To start out, we declare our new ``FileSizeParser`` class:
 
 .. code-block:: c++
 
-  struct FileSizeParser : public cl::basic_parser<unsigned> {
+  struct FileSizeParser : public cl::parser<unsigned> {
     // parse - Return true on error.
-    bool parse(cl::Option &O, const char *ArgName, const std::string &ArgValue,
+    bool parse(cl::Option &O, StringRef ArgName, const std::string &ArgValue,
                unsigned &Val);
   };
 
-Our new class inherits from the ``cl::basic_parser`` template class to fill in
+Our new class inherits from the ``cl::parser`` template class to fill in
 the default, boiler plate code for us.  We give it the data type that we parse
 into, the last argument to the ``parse`` method, so that clients of our custom
 parser know what object type to pass in to the parse method.  (Here we declare
@@ -1652,7 +1652,7 @@ implement ``parse`` as:
 
 .. code-block:: c++
 
-  bool FileSizeParser::parse(cl::Option &O, const char *ArgName,
+  bool FileSizeParser::parse(cl::Option &O, StringRef ArgName,
                              const std::string &Arg, unsigned &Val) {
     const char *ArgStart = Arg.c_str();
     char *End;
@@ -1698,7 +1698,7 @@ Which adds this to the output of our program:
   OPTIONS:
     -help                 - display available options (-help-hidden for more)
     ...
-   -max-file-size=<size> - Maximum file size to accept
+    -max-file-size=<size> - Maximum file size to accept
 
 And we can test that our parse works correctly now (the test program just prints
 out the max-file-size argument value):
diff --git a/docs/CompilerWriterInfo.rst b/docs/CompilerWriterInfo.rst
index 606b5f5afec8..a012c329541e 100644
--- a/docs/CompilerWriterInfo.rst
+++ b/docs/CompilerWriterInfo.rst
@@ -74,6 +74,7 @@ R600
 * `AMD Evergreen shader ISA <http://developer.amd.com/wordpress/media/2012/10/AMD_Evergreen-Family_Instruction_Set_Architecture.pdf>`_
 * `AMD Cayman/Trinity shader ISA <http://developer.amd.com/wordpress/media/2012/10/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf>`_
 * `AMD Southern Islands Series ISA <http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf>`_
+* `AMD Sea Islands Series ISA <http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf>`_
 * `AMD GPU Programming Guide <http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf>`_
 * `AMD Compute Resources <http://developer.amd.com/tools/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/documentation/>`_
 
diff --git a/docs/CoverageMappingFormat.rst b/docs/CoverageMappingFormat.rst
new file mode 100644
index 000000000000..8fcffb838a3f
--- /dev/null
+++ b/docs/CoverageMappingFormat.rst
@@ -0,0 +1,576 @@
+.. role:: raw-html(raw)
+   :format: html
+
+=================================
+LLVM Code Coverage Mapping Format
+=================================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+LLVM's code coverage mapping format is used to provide code coverage
+analysis using LLVM's and Clang's instrumenation based profiling
+(Clang's ``-fprofile-instr-generate`` option).
+
+This document is aimed at those who use LLVM's code coverage mapping to provide
+code coverage analysis for their own programs, and for those who would like
+to know how it works under the hood. A prior knowledge of how Clang's profile
+guided optimization works is useful, but not required.
+
+We start by showing how to use LLVM and Clang for code coverage analysis,
+then we briefly desribe LLVM's code coverage mapping format and the
+way that Clang and LLVM's code coverage tool work with this format. After
+the basics are down, more advanced features of the coverage mapping format
+are discussed - such as the data structures, LLVM IR representation and
+the binary encoding.
+
+Quick Start
+===========
+
+Here's a short story that describes how to generate code coverage overview
+for a sample source file called *test.c*.
+
+* First, compile an instrumented version of your program using Clang's
+  ``-fprofile-instr-generate`` option with the additional ``-fcoverage-mapping``
+  option:
+
+  ``clang -o test -fprofile-instr-generate -fcoverage-mapping test.c``
+* Then, run the instrumented binary. The runtime will produce a file called
+  *default.profraw* containing the raw profile instrumentation data:
+
+  ``./test``
+* After that, merge the profile data using the *llvm-profdata* tool:
+
+  ``llvm-profdata merge -o test.profdata default.profraw``
+* Finally, run LLVM's code coverage tool (*llvm-cov*) to produce the code
+  coverage overview for the sample source file:
+
+  ``llvm-cov show ./test -instr-profile=test.profdata test.c``
+
+High Level Overview
+===================
+
+LLVM's code coverage mapping format is designed to be a self contained
+data format, that can be embedded into the LLVM IR and object files.
+It's described in this document as a **mapping** format because its goal is
+to store the data that is required for a code coverage tool to map between
+the specific source ranges in a file and the execution counts obtained
+after running the instrumented version of the program.
+
+The mapping data is used in two places in the code coverage process:
+
+1. When clang compiles a source file with ``-fcoverage-mapping``, it
+   generates the mapping information that describes the mapping between the
+   source ranges and the profiling instrumentation counters.
+   This information gets embedded into the LLVM IR and conveniently
+   ends up in the final executable file when the program is linked.
+
+2. It is also used by *llvm-cov* - the mapping information is extracted from an
+   object file and is used to associate the execution counts (the values of the
+   profile instrumentation counters), and the source ranges in a file.
+   After that, the tool is able to generate various code coverage reports
+   for the program.
+
+The coverage mapping format aims to be a "universal format" that would be
+suitable for usage by any frontend, and not just by Clang. It also aims to
+provide the frontend the possibility of generating the minimal coverage mapping
+data in order to reduce the size of the IR and object files - for example,
+instead of emitting mapping information for each statement in a function, the
+frontend is allowed to group the statements with the same execution count into
+regions of code, and emit the mapping information only for those regions.
+
+Advanced Concepts
+=================
+
+The remainder of this guide is meant to give you insight into the way the
+coverage mapping format works.
+
+The coverage mapping format operates on a per-function level as the
+profile instrumentation counters are associated with a specific function.
+For each function that requires code coverage, the frontend has to create
+coverage mapping data that can map between the source code ranges and
+the profile instrumentation counters for that function.
+
+Mapping Region
+--------------
+
+The function's coverage mapping data contains an array of mapping regions.
+A mapping region stores the `source code range`_ that is covered by this region,
+the `file id <coverage file id_>`_, the `coverage mapping counter`_ and
+the region's kind.
+There are several kinds of mapping regions:
+
+* Code regions associate portions of source code and `coverage mapping
+  counters`_. They make up the majority of the mapping regions. They are used
+  by the code coverage tool to compute the execution counts for lines,
+  highlight the regions of code that were never executed, and to obtain
+  the various code coverage statistics for a function.
+  For example:
+
+  :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{    </span> <span class='c1'>// Code Region from 1:40 to 9:2</span>
+  <span style='background-color:#4A789C'>                                            </span>
+  <span style='background-color:#4A789C'>  if (argc &gt; 1) </span><span style='background-color:#85C1F5'>{                         </span>   <span class='c1'>// Code Region from 3:17 to 5:4</span>
+  <span style='background-color:#85C1F5'>    printf("%s\n", argv[1]);              </span>
+  <span style='background-color:#85C1F5'>  }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{                                </span>   <span class='c1'>// Code Region from 5:10 to 7:4</span>
+  <span style='background-color:#F6D55D'>    printf("\n");                         </span>
+  <span style='background-color:#F6D55D'>  }</span><span style='background-color:#4A789C'>                                         </span>
+  <span style='background-color:#4A789C'>  return 0;                                 </span>
+  <span style='background-color:#4A789C'>}</span>
+  </pre>`
+* Skipped regions are used to represent source ranges that were skipped
+  by Clang's preprocessor. They don't associate with
+  `coverage mapping counters`_, as the frontend knows that they are never
+  executed. They are used by the code coverage tool to mark the skipped lines
+  inside a function as non-code lines that don't have execution counts.
+  For example:
+
+  :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{               </span> <span class='c1'>// Code Region from 1:12 to 6:2</span>
+  <span style='background-color:#85C1F5'>#ifdef DEBUG             </span>   <span class='c1'>// Skipped Region from 2:1 to 4:2</span>
+  <span style='background-color:#85C1F5'>  printf("Hello world"); </span>
+  <span style='background-color:#85C1F5'>#</span><span style='background-color:#4A789C'>endif                     </span>
+  <span style='background-color:#4A789C'>  return 0;                </span>
+  <span style='background-color:#4A789C'>}</span>
+  </pre>`
+* Expansion regions are used to represent Clang's macro expansions. They
+  have an additional property - *expanded file id*. This property can be
+  used by the code coverage tool to find the mapping regions that are created
+  as a result of this macro expansion, by checking if their file id matches the
+  expanded file id. They don't associate with `coverage mapping counters`_,
+  as the code coverage tool can determine the execution count for this region
+  by looking up the execution count of the first region with a corresponding
+  file id.
+  For example:
+
+  :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int func(int x) </span><span style='background-color:#4A789C'>{                             </span>
+  <span style='background-color:#4A789C'>  #define MAX(x,y) </span><span style='background-color:#85C1F5'>((x) &gt; (y)? </span><span style='background-color:#F6D55D'>(x)</span><span style='background-color:#85C1F5'> : </span><span style='background-color:#F4BA70'>(y)</span><span style='background-color:#85C1F5'>)</span><span style='background-color:#4A789C'>     </span>
+  <span style='background-color:#4A789C'>  return </span><span style='background-color:#7FCA9F'>MAX</span><span style='background-color:#4A789C'>(x, 42);                          </span> <span class='c1'>// Expansion Region from 3:10 to 3:13</span>
+  <span style='background-color:#4A789C'>}</span>
+  </pre>`
+
+.. _source code range:
+
+Source Range:
+^^^^^^^^^^^^^
+
+The source range record contains the starting and ending location of a certain
+mapping region. Both locations include the line and the column numbers.
+
+.. _coverage file id:
+
+File ID:
+^^^^^^^^
+
+The file id an integer value that tells us
+in which source file or macro expansion is this region located.
+It enables Clang to produce mapping information for the code
+defined inside macros, like this example demonstrates:
+
+:raw-html:`<pre class='highlight' style='line-height:initial;'><span>void func(const char *str) </span><span style='background-color:#4A789C'>{        </span> <span class='c1'>// Code Region from 1:28 to 6:2 with file id 0</span>
+<span style='background-color:#4A789C'>  #define PUT </span><span style='background-color:#85C1F5'>printf("%s\n", str)</span><span style='background-color:#4A789C'>   </span> <span class='c1'>// 2 Code Regions from 2:15 to 2:34 with file ids 1 and 2</span>
+<span style='background-color:#4A789C'>  if(*str)                          </span>
+<span style='background-color:#4A789C'>    </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>;                            </span> <span class='c1'>// Expansion Region from 4:5 to 4:8 with file id 0 that expands a macro with file id 1</span>
+<span style='background-color:#4A789C'>  </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>;                              </span> <span class='c1'>// Expansion Region from 5:3 to 5:6 with file id 0 that expands a macro with file id 2</span>
+<span style='background-color:#4A789C'>}</span>
+</pre>`
+
+.. _coverage mapping counter:
+.. _coverage mapping counters:
+
+Counter:
+^^^^^^^^
+
+A coverage mapping counter can represents a reference to the profile
+instrumentation counter. The execution count for a region with such counter
+is determined by looking up the value of the corresponding profile
+instrumentation counter.
+
+It can also represent a binary arithmetical expression that operates on
+coverage mapping counters or other expressions.
+The execution count for a region with an expression counter is determined by
+evaluating the expression's arguments and then adding them together or
+subtracting them from one another.
+In the example below, a subtraction expression is used to compute the execution
+count for the compound statement that follows the *else* keyword:
+
+:raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{   </span> <span class='c1'>// Region's counter is a reference to the profile counter #0</span>
+<span style='background-color:#4A789C'>                                           </span>
+<span style='background-color:#4A789C'>  if (argc &gt; 1) </span><span style='background-color:#85C1F5'>{                        </span>   <span class='c1'>// Region's counter is a reference to the profile counter #1</span>
+<span style='background-color:#85C1F5'>    printf("%s\n", argv[1]);             </span><span>   </span>
+<span style='background-color:#85C1F5'>  }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{                               </span>   <span class='c1'>// Region's counter is an expression (reference to the profile counter #0 - reference to the profile counter #1)</span>
+<span style='background-color:#F6D55D'>    printf("\n");                        </span>
+<span style='background-color:#F6D55D'>  }</span><span style='background-color:#4A789C'>                                        </span>
+<span style='background-color:#4A789C'>  return 0;                                </span>
+<span style='background-color:#4A789C'>}</span>
+</pre>`
+
+Finally, a coverage mapping counter can also represent an execution count of
+of zero. The zero counter is used to provide coverage mapping for
+unreachable statements and expressions, like in the example below:
+
+:raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{                  </span>
+<span style='background-color:#4A789C'>  return 0;                   </span>
+<span style='background-color:#4A789C'>  </span><span style='background-color:#85C1F5'>printf("Hello world!\n")</span><span style='background-color:#4A789C'>;   </span> <span class='c1'>// Unreachable region's counter is zero</span>
+<span style='background-color:#4A789C'>}</span>
+</pre>`
+
+The zero counters allow the code coverage tool to display proper line execution
+counts for the unreachable lines and highlight the unreachable code.
+Without them, the tool would think that those lines and regions were still
+executed, as it doesn't possess the frontend's knowledge.
+
+LLVM IR Representation
+======================
+
+The coverage mapping data is stored in the LLVM IR using a single global
+constant structure variable called *__llvm_coverage_mapping*
+with the *__llvm_covmap* section specifier.
+
+For example, let’s consider a C file and how it gets compiled to LLVM:
+
+.. _coverage mapping sample:
+
+.. code-block:: c
+
+  int foo() {
+    return 42;
+  }
+  int bar() {
+    return 13;
+  }
+
+The coverage mapping variable generated by Clang is:
+
+.. code-block:: llvm
+
+  @__llvm_coverage_mapping = internal constant { i32, i32, i32, i32, [2 x { i8*, i32, i32 }], [40 x i8] }
+  { i32 2,  ; The number of function records
+    i32 20, ; The length of the string that contains the encoded translation unit filenames
+    i32 20, ; The length of the string that contains the encoded coverage mapping data
+    i32 0,  ; Coverage mapping format version
+    [2 x { i8*, i32, i32 }] [ ; Function records
+     { i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_foo, i32 0, i32 0), ; Function's name
+       i32 3, ; Function's name length
+       i32 9  ; Function's encoded coverage mapping data string length
+     },
+     { i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_bar, i32 0, i32 0), ; Function's name
+       i32 3, ; Function's name length
+       i32 9  ; Function's encoded coverage mapping data string length
+     }],
+   [40 x i8] c"..." ; Encoded data (dissected later)
+  }, section "__llvm_covmap", align 8
+
+Version:
+--------
+
+The coverage mapping version number can have the following values:
+
+* 0 — The first (current) version of the coverage mapping format.
+
+.. _function records:
+
+Function record:
+----------------
+
+A function record is a structure of the following type:
+
+.. code-block:: llvm
+
+  { i8*, i32, i32 }
+
+It contains the pointer to the function's name, function's name length,
+and the length of the encoded mapping data for that function.
+
+Encoded data:
+-------------
+
+The encoded data is stored in a single string that contains
+the encoded filenames used by this translation unit and the encoded coverage
+mapping data for each function in this translation unit.
+
+The encoded data has the following structure:
+
+``[filenames, coverageMappingDataForFunctionRecord0, coverageMappingDataForFunctionRecord1, ..., padding]``
+
+If necessary, the encoded data is padded with zeroes so that the size
+of the data string is rounded up to the nearest multiple of 8 bytes.
+
+Dissecting the sample:
+^^^^^^^^^^^^^^^^^^^^^^
+
+Here's an overview of the encoded data that was stored in the
+IR for the `coverage mapping sample`_ that was shown earlier:
+
+* The IR contains the following string constant that represents the encoded
+  coverage mapping data for the sample translation unit:
+
+  .. code-block:: llvm
+
+    c"\01\12/Users/alex/test.c\01\00\00\01\01\01\0C\02\02\01\00\00\01\01\04\0C\02\02\00\00"
+
+* The string contains values that are encoded in the LEB128 format, which is
+  used throughout for storing integers. It also contains a string value.
+
+* The length of the substring that contains the encoded translation unit
+  filenames is the value of the second field in the *__llvm_coverage_mapping*
+  structure, which is 20, thus the filenames are encoded in this string:
+
+  .. code-block:: llvm
+
+    c"\01\12/Users/alex/test.c"
+
+  This string contains the following data:
+
+  * Its first byte has a value of ``0x01``. It stores the number of filenames
+    contained in this string.
+  * Its second byte stores the length of the first filename in this string.
+  * The remaining 18 bytes are used to store the first filename.
+
+* The length of the substring that contains the encoded coverage mapping data
+  for the first function is the value of the third field in the first
+  structure in an array of `function records`_ stored in the
+  fifth field of the *__llvm_coverage_mapping* structure, which is the 9.
+  Therefore, the coverage mapping for the first function record is encoded
+  in this string:
+
+  .. code-block:: llvm
+
+    c"\01\00\00\01\01\01\0C\02\02"
+
+  This string consists of the following bytes:
+
+  +----------+-------------------------------------------------------------------------------------------------------------------------+
+  | ``0x01`` | The number of file ids used by this function. There is only one file id used by the mapping data in this function.      |
+  +----------+-------------------------------------------------------------------------------------------------------------------------+
+  | ``0x00`` | An index into the filenames array which corresponds to the file "/Users/alex/test.c".                                   |
+  +----------+-------------------------------------------------------------------------------------------------------------------------+
+  | ``0x00`` | The number of counter expressions used by this function. This function doesn't use any expressions.                     |
+  +----------+-------------------------------------------------------------------------------------------------------------------------+
+  | ``0x01`` | The number of mapping regions that are stored in an array for the function's file id #0.                                |
+  +----------+-------------------------------------------------------------------------------------------------------------------------+
+  | ``0x01`` | The coverage mapping counter for the first region in this function. The value of 1 tells us that it's a coverage        |
+  |          | mapping counter that is a reference ot the profile instrumentation counter with an index of 0.                          |
+  +----------+-------------------------------------------------------------------------------------------------------------------------+
+  | ``0x01`` | The starting line of the first mapping region in this function.                                                         |
+  +----------+-------------------------------------------------------------------------------------------------------------------------+
+  | ``0x0C`` | The starting column of the first mapping region in this function.                                                       |
+  +----------+-------------------------------------------------------------------------------------------------------------------------+
+  | ``0x02`` | The ending line of the first mapping region in this function.                                                           |
+  +----------+-------------------------------------------------------------------------------------------------------------------------+
+  | ``0x02`` | The ending column of the first mapping region in this function.                                                         |
+  +----------+-------------------------------------------------------------------------------------------------------------------------+
+
+* The length of the substring that contains the encoded coverage mapping data
+  for the second function record is also 9. It's structured like the mapping data
+  for the first function record.
+
+* The two trailing bytes are zeroes and are used to pad the coverage mapping
+  data to give it the 8 byte alignment.
+
+Encoding
+========
+
+The per-function coverage mapping data is encoded as a stream of bytes,
+with a simple structure. The structure consists of the encoding
+`types <cvmtypes_>`_ like variable-length unsigned integers, that
+are used to encode `File ID Mapping`_, `Counter Expressions`_ and
+the `Mapping Regions`_.
+
+The format of the structure follows:
+
+  ``[file id mapping, counter expressions, mapping regions]``
+
+The translation unit filenames are encoded using the same encoding
+`types <cvmtypes_>`_ as the per-function coverage mapping data, with the
+following structure:
+
+  ``[numFilenames : LEB128, filename0 : string, filename1 : string, ...]``
+
+.. _cvmtypes:
+
+Types
+-----
+
+This section describes the basic types that are used by the encoding format
+and can appear after ``:`` in the ``[foo : type]`` description.
+
+.. _LEB128:
+
+LEB128
+^^^^^^
+
+LEB128 is an unsigned interger value that is encoded using DWARF's LEB128
+encoding, optimizing for the case where values are small
+(1 byte for values less than 128).
+
+.. _strings:
+
+Strings
+^^^^^^^
+
+``[length : LEB128, characters...]``
+
+String values are encoded with a `LEB value <LEB128_>`_ for the length
+of the string and a sequence of bytes for its characters.
+
+.. _file id mapping:
+
+File ID Mapping
+---------------
+
+``[numIndices : LEB128, filenameIndex0 : LEB128, filenameIndex1 : LEB128, ...]``
+
+File id mapping in a function's coverage mapping stream
+contains the indices into the translation unit's filenames array.
+
+Counter
+-------
+
+``[value : LEB128]``
+
+A `coverage mapping counter`_ is stored in a single `LEB value <LEB128_>`_.
+It is composed of two things --- the `tag <counter-tag_>`_
+which is stored in the lowest 2 bits, and the `counter data`_ which is stored
+in the remaining bits.
+
+.. _counter-tag:
+
+Tag:
+^^^^
+
+The counter's tag encodes the counter's kind
+and, if the counter is an expression, the expression's kind.
+The possible tag values are:
+
+* 0 - The counter is zero.
+
+* 1 - The counter is a reference to the profile instrumentation counter.
+
+* 2 - The counter is a subtraction expression.
+
+* 3 - The counter is an addition expression.
+
+.. _counter data:
+
+Data:
+^^^^^
+
+The counter's data is interpreted in the following manner:
+
+* When the counter is a reference to the profile instrumentation counter,
+  then the counter's data is the id of the profile counter.
+* When the counter is an expression, then the counter's data
+  is the index into the array of counter expressions.
+
+.. _Counter Expressions:
+
+Counter Expressions
+-------------------
+
+``[numExpressions : LEB128, expr0LHS : LEB128, expr0RHS : LEB128, expr1LHS : LEB128, expr1RHS : LEB128, ...]``
+
+Counter expressions consist of two counters as they
+represent binary arithmetic operations.
+The expression's kind is determined from the `tag <counter-tag_>`_ of the
+counter that references this expression.
+
+.. _Mapping Regions:
+
+Mapping Regions
+---------------
+
+``[numRegionArrays : LEB128, regionsForFile0, regionsForFile1, ...]``
+
+The mapping regions are stored in an array of sub-arrays where every
+region in a particular sub-array has the same file id.
+
+The file id for a sub-array of regions is the index of that
+sub-array in the main array e.g. The first sub-array will have the file id
+of 0.
+
+Sub-Array of Regions
+^^^^^^^^^^^^^^^^^^^^
+
+``[numRegions : LEB128, region0, region1, ...]``
+
+The mapping regions for a specific file id are stored in an array that is
+sorted in an ascending order by the region's starting location.
+
+Mapping Region
+^^^^^^^^^^^^^^
+
+``[header, source range]``
+
+The mapping region record contains two sub-records ---
+the `header`_, which stores the counter and/or the region's kind,
+and the `source range`_ that contains the starting and ending
+location of this region.
+
+.. _header:
+
+Header
+^^^^^^
+
+``[counter]``
+
+or
+
+``[pseudo-counter]``
+
+The header encodes the region's counter and the region's kind.
+
+The value of the counter's tag distinguishes between the counters and
+pseudo-counters --- if the tag is zero, than this header contains a
+pseudo-counter, otherwise this header contains an ordinary counter.
+
+Counter:
+""""""""
+
+A mapping region whose header has a counter with a non-zero tag is
+a code region.
+
+Pseudo-Counter:
+"""""""""""""""
+
+``[value : LEB128]``
+
+A pseudo-counter is stored in a single `LEB value <LEB128_>`_, just like
+the ordinary counter. It has the following interpretation:
+
+* bits 0-1: tag, which is always 0.
+
+* bit 2: expansionRegionTag. If this bit is set, then this mapping region
+  is an expansion region.
+
+* remaining bits: data. If this region is an expansion region, then the data
+  contains the expanded file id of that region.
+
+  Otherwise, the data contains the region's kind. The possible region
+  kind values are:
+
+  * 0 - This mapping region is a code region with a counter of zero.
+  * 2 - This mapping region is a skipped region.
+
+.. _source range:
+
+Source Range
+^^^^^^^^^^^^
+
+``[deltaLineStart : LEB128, columnStart : LEB128, numLines : LEB128, columnEnd : LEB128]``
+
+The source range record contains the following fields:
+
+* *deltaLineStart*: The difference between the starting line of the
+  current mapping region and the starting line of the previous mapping region.
+
+  If the current mapping region is the first region in the current
+  sub-array, then it stores the starting line of that region.
+
+* *columnStart*: The starting column of the mapping region.
+
+* *numLines*: The difference between the ending line and the starting line
+  of the current mapping region.
+
+* *columnEnd*: The ending column of the mapping region.
diff --git a/docs/DeveloperPolicy.rst b/docs/DeveloperPolicy.rst
index 74a89791128c..508a04fd2d59 100644
--- a/docs/DeveloperPolicy.rst
+++ b/docs/DeveloperPolicy.rst
@@ -436,6 +436,29 @@ list, development list, or LLVM bug tracker component. If someone sends you
 a patch privately, encourage them to submit it to the appropriate list first.
 
 
+IR Backwards Compatibility
+--------------------------
+
+When the IR format has to be changed, keep in mind that we try to maintain some
+backwards compatibility. The rules are intended as a balance between convenience
+for llvm users and not imposing a big burden on llvm developers:
+
+* The textual format is not backwards compatible. We don't change it too often,
+  but there are no specific promises.
+
+* The bitcode format produced by a X.Y release will be readable by all following
+  X.Z releases and the (X+1).0 release.
+
+* Newer releases can ignore features from older releases, but they cannot
+  miscompile them. For example, if nsw is ever replaced with something else,
+  dropping it would be a valid way to upgrade the IR.
+
+* Debug metadata is special in that it is currently dropped during upgrades.
+
+* Non-debug metadata is defined to be safe to drop, so a valid way to upgrade
+  it is to drop it. That is not very user friendly and a bit more effort is
+  expected, but no promises are made.
+
 .. _copyright-license-patents:
 
 Copyright, License, and Patents
diff --git a/docs/ExtendingLLVM.rst b/docs/ExtendingLLVM.rst
index 60cbf011e573..2552c075c9c7 100644
--- a/docs/ExtendingLLVM.rst
+++ b/docs/ExtendingLLVM.rst
@@ -58,7 +58,7 @@ function and then be turned into an instruction if warranted.
    If it is possible to constant fold your intrinsic, add support to it in the
    ``canConstantFoldCallTo`` and ``ConstantFoldCall`` functions.
 
-#. ``llvm/test/Regression/*``:
+#. ``llvm/test/*``:
 
    Add test cases for your test cases to the test suite
 
@@ -164,10 +164,10 @@ complicated behavior in a single node (rotate).
 
 #. TODO: document complex patterns.
 
-#. ``llvm/test/Regression/CodeGen/*``:
+#. ``llvm/test/CodeGen/*``:
 
    Add test cases for your new node to the test suite.
-   ``llvm/test/Regression/CodeGen/X86/bswap.ll`` is a good example.
+   ``llvm/test/CodeGen/X86/bswap.ll`` is a good example.
 
 Adding a new instruction
 ========================
@@ -217,7 +217,7 @@ Adding a new instruction
 
    add support for your instruction to code generators, or add a lowering pass.
 
-#. ``llvm/test/Regression/*``:
+#. ``llvm/test/*``:
 
    add your test cases to the test suite.
 
diff --git a/docs/GarbageCollection.rst b/docs/GarbageCollection.rst
index dc6dab1d336c..49d3496748e2 100644
--- a/docs/GarbageCollection.rst
+++ b/docs/GarbageCollection.rst
@@ -923,7 +923,7 @@ a realistic example:
 
   void MyGCPrinter::finishAssembly(AsmPrinter &AP) {
     MCStreamer &OS = AP.OutStreamer;
-    unsigned IntPtrSize = AP.TM.getDataLayout()->getPointerSize();
+    unsigned IntPtrSize = AP.TM.getSubtargetImpl()->getDataLayout()->getPointerSize();
 
     // Put this in the data section.
     OS.SwitchSection(AP.getObjFileLowering().getDataSection());
diff --git a/docs/GettingStarted.rst b/docs/GettingStarted.rst
index d409f623f868..316f1f7380f0 100644
--- a/docs/GettingStarted.rst
+++ b/docs/GettingStarted.rst
@@ -115,7 +115,6 @@ LLVM is known to work on the following host platforms:
 ================== ===================== =============
 OS                 Arch                  Compilers               
 ================== ===================== =============
-AuroraUX           x86\ :sup:`1`         GCC                     
 Linux              x86\ :sup:`1`         GCC, Clang              
 Linux              amd64                 GCC, Clang              
 Linux              ARM\ :sup:`4`         GCC, Clang              
@@ -165,7 +164,7 @@ Package                                                     Version      Notes
 =========================================================== ============ ==========================================
 `GNU Make <http://savannah.gnu.org/projects/make>`_         3.79, 3.79.1 Makefile/build processor
 `GCC <http://gcc.gnu.org/>`_                                >=4.7.0      C/C++ compiler\ :sup:`1`
-`python <http://www.python.org/>`_                          >=2.5        Automated test suite\ :sup:`2`
+`python <http://www.python.org/>`_                          >=2.7        Automated test suite\ :sup:`2`
 `GNU M4 <http://savannah.gnu.org/projects/m4>`_             1.4          Macro processor for configuration\ :sup:`3`
 `GNU Autoconf <http://www.gnu.org/software/autoconf/>`_     2.60         Configuration script builder\ :sup:`3`
 `GNU Automake <http://www.gnu.org/software/automake/>`_     1.9.6        aclocal macro generator\ :sup:`3`
@@ -331,10 +330,23 @@ of this information from.
 .. _GCC wiki entry:
   http://gcc.gnu.org/wiki/InstallingGCC
 
-Once you have a GCC toolchain, use it as your host compiler. Things should
-generally "just work". You may need to pass a special linker flag,
-``-Wl,-rpath,$HOME/toolchains/lib`` or some variant thereof to get things to
-find the libstdc++ DSO in this toolchain.
+Once you have a GCC toolchain, configure your build of LLVM to use the new
+toolchain for your host compiler and C++ standard library. Because the new
+version of libstdc++ is not on the system library search path, you need to pass
+extra linker flags so that it can be found at link time (``-L``) and at runtime
+(``-rpath``). If you are using CMake, this invocation should produce working
+binaries:
+
+.. code-block:: console
+
+  % mkdir build
+  % cd build
+  % CC=$HOME/toolchains/bin/gcc CXX=$HOME/toolchains/bin/g++ \
+    cmake .. -DCMAKE_CXX_LINK_FLAGS="-Wl,-rpath,$HOME/toolchains/lib64 -L$HOME/toolchains/lib64"
+
+If you fail to set rpath, most LLVM binaries will fail on startup with a message
+from the loader similar to ``libstdc++.so.6: version `GLIBCXX_3.4.20' not
+found``. This means you need to tweak the -rpath linker flag.
 
 When you build Clang, you will need to give *it* access to modern C++11
 standard library in order to use it as your new host in part of a bootstrap.
@@ -1006,7 +1018,7 @@ This directory contains most of the source files of the LLVM system. In LLVM,
 almost all code exists in libraries, making it very easy to share code among the
 different `tools`_.
 
-``llvm/lib/VMCore/``
+``llvm/lib/IR/``
 
   This directory holds the core LLVM source files that implement core classes
   like Instruction and BasicBlock.
diff --git a/docs/GettingStartedVS.rst b/docs/GettingStartedVS.rst
index d914cc176dfd..fa20912be22c 100644
--- a/docs/GettingStartedVS.rst
+++ b/docs/GettingStartedVS.rst
@@ -57,8 +57,8 @@ You will also need the `CMake <http://www.cmake.org/>`_ build system since it
 generates the project files you will use to build with.
 
 If you would like to run the LLVM tests you will need `Python
-<http://www.python.org/>`_. Versions 2.4-2.7 are known to work. You will need
-`GnuWin32 <http://gnuwin32.sourceforge.net/>`_ tools, too.
+<http://www.python.org/>`_. Version 2.7 and newer are known to work. You will
+need `GnuWin32 <http://gnuwin32.sourceforge.net/>`_ tools, too.
 
 Do not install the LLVM directory tree into a path containing spaces (e.g.
 ``C:\Documents and Settings\...``) as the configure step will fail.
diff --git a/docs/GoldPlugin.rst b/docs/GoldPlugin.rst
index 28b202adf084..6328934b37b3 100644
--- a/docs/GoldPlugin.rst
+++ b/docs/GoldPlugin.rst
@@ -44,9 +44,11 @@ will either need to build gold or install a version with plugin support.
   the ``-plugin`` option. Running ``make`` will additionally build
   ``build/binutils/ar`` and ``nm-new`` binaries supporting plugins.
 
-* Build the LLVMgold plugin: Configure LLVM with
-  ``--with-binutils-include=/path/to/binutils/include`` and run
-  ``make``.
+* Build the LLVMgold plugin.  If building with autotools, run configure with
+  ``--with-binutils-include=/path/to/binutils/include`` and run ``make``.
+  If building with CMake, run cmake with
+  ``-DLLVM_BINUTILS_INCDIR=/path/to/binutils/include``.  The correct include
+  path will contain the file ``plugin-api.h``.
 
 Usage
 =====
diff --git a/docs/HowToSubmitABug.rst b/docs/HowToSubmitABug.rst
index 702dc0c6112b..9f997d2757dd 100644
--- a/docs/HowToSubmitABug.rst
+++ b/docs/HowToSubmitABug.rst
@@ -89,7 +89,7 @@ Then run:
 
 .. code-block:: bash
 
-   opt -std-compile-opts -debug-pass=Arguments foo.bc -disable-output
+   opt -O3 -debug-pass=Arguments foo.bc -disable-output
 
 This command should do two things: it should print out a list of passes, and
 then it should crash in the same way as clang.  If it doesn't crash, please
diff --git a/docs/LangRef.rst b/docs/LangRef.rst
index 17ccf9b0bac9..b18474915e18 100644
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@@ -75,11 +75,12 @@ identifiers, for different purposes:
 #. Named values are represented as a string of characters with their
    prefix. For example, ``%foo``, ``@DivisionByZero``,
    ``%a.really.long.identifier``. The actual regular expression used is
-   '``[%@][a-zA-Z$._][a-zA-Z$._0-9]*``'. Identifiers which require other
+   '``[%@][-a-zA-Z$._][-a-zA-Z$._0-9]*``'. Identifiers that require other
    characters in their names can be surrounded with quotes. Special
    characters may be escaped using ``"\xx"`` where ``xx`` is the ASCII
    code for the character in hexadecimal. In this way, any character can
-   be used in a name value, even quotes themselves.
+   be used in a name value, even quotes themselves. The ``"\01"`` prefix
+   can be used on global variables to suppress mangling.
 #. Unnamed values are represented as an unsigned numeric value with
    their prefix. For example, ``%12``, ``@2``, ``%44``.
 #. Constants, which are described in the section  Constants_ below.
@@ -128,9 +129,10 @@ lexical features of LLVM:
 #. Unnamed temporaries are created when the result of a computation is
    not assigned to a named value.
 #. Unnamed temporaries are numbered sequentially (using a per-function
-   incrementing counter, starting with 0). Note that basic blocks are
-   included in this numbering. For example, if the entry basic block is not
-   given a label name, then it will get number 0.
+   incrementing counter, starting with 0). Note that basic blocks and unnamed
+   function parameters are included in this numbering. For example, if the
+   entry basic block is not given a label name and all function parameters are
+   named, then it will get number 0.
 
 It also shows a convention that we follow in this document. When
 demonstrating instructions, we will follow an instruction with a comment
@@ -168,8 +170,8 @@ symbol table entries. Here is an example of the "hello world" module:
     }
 
     ; Named metadata
-    !1 = metadata !{i32 42}
-    !foo = !{!1, null}
+    !0 = !{i32 42, null, !"string"}
+    !foo = !{!0}
 
 This example is made up of a :ref:`global variable <globalvars>` named
 "``.str``", an external declaration of the "``puts``" function, a
@@ -500,7 +502,7 @@ Structure Types
 LLVM IR allows you to specify both "identified" and "literal" :ref:`structure
 types <t_struct>`.  Literal types are uniqued structurally, but identified types
 are never uniqued.  An :ref:`opaque structural type <t_opaque>` can also be used
-to forward declare a type which is not yet available.
+to forward declare a type that is not yet available.
 
 An example of a identified structure specification is:
 
@@ -594,7 +596,8 @@ Syntax::
     [@<GlobalVarName> =] [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal]
                          [unnamed_addr] [AddrSpace] [ExternallyInitialized]
                          <global | constant> <Type> [<InitializerConstant>]
-                         [, section "name"] [, align <Alignment>]
+                         [, section "name"] [, comdat [($name)]]
+                         [, align <Alignment>]
 
 For example, the following defines a global in a numbered address space
 with an initializer, section, and alignment:
@@ -631,7 +634,8 @@ name, a (possibly empty) argument list (each with optional :ref:`parameter
 attributes <paramattrs>`), optional :ref:`function attributes <fnattrs>`,
 an optional section, an optional alignment,
 an optional :ref:`comdat <langref_comdats>`,
-an optional :ref:`garbage collector name <gc>`, an optional :ref:`prefix <prefixdata>`, an opening
+an optional :ref:`garbage collector name <gc>`, an optional :ref:`prefix <prefixdata>`,
+an optional :ref:`prologue <prologuedata>`, an opening
 curly brace, a list of basic blocks, and a closing curly brace.
 
 LLVM function declarations consist of the "``declare``" keyword, an
@@ -641,7 +645,8 @@ an optional :ref:`calling convention <callingconv>`,
 an optional ``unnamed_addr`` attribute, a return type, an optional
 :ref:`parameter attribute <paramattrs>` for the return type, a function
 name, a possibly empty list of arguments, an optional alignment, an optional
-:ref:`garbage collector name <gc>` and an optional :ref:`prefix <prefixdata>`.
+:ref:`garbage collector name <gc>`, an optional :ref:`prefix <prefixdata>`,
+and an optional :ref:`prologue <prologuedata>`.
 
 A function definition contains a list of basic blocks, forming the CFG (Control
 Flow Graph) for the function. Each basic block may optionally start with a label
@@ -677,8 +682,16 @@ Syntax::
     define [linkage] [visibility] [DLLStorageClass]
            [cconv] [ret attrs]
            <ResultType> @<FunctionName> ([argument list])
-           [unnamed_addr] [fn Attrs] [section "name"] [comdat $<ComdatName>]
-           [align N] [gc] [prefix Constant] { ... }
+           [unnamed_addr] [fn Attrs] [section "name"] [comdat [($name)]]
+           [align N] [gc] [prefix Constant] [prologue Constant] { ... }
+
+The argument list is a comma seperated sequence of arguments where each
+argument is of the following form
+
+Syntax::
+
+   <type> [parameter Attrs] [name]
+
 
 .. _langref_aliases:
 
@@ -697,7 +710,7 @@ Aliases may have an optional :ref:`linkage type <linkage>`, an optional
 
 Syntax::
 
-    @<Name> = [Visibility] [DLLStorageClass] [ThreadLocal] [unnamed_addr] alias [Linkage] <AliaseeTy> @<Aliasee>
+    @<Name> = [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal] [unnamed_addr] alias <AliaseeTy> @<Aliasee>
 
 The linkage must be one of ``private``, ``internal``, ``linkonce``, ``weak``,
 ``linkonce_odr``, ``weak_odr``, ``external``. Note that some system linkers
@@ -727,7 +740,7 @@ Comdats
 
 Comdat IR provides access to COFF and ELF object file COMDAT functionality.
 
-Comdats have a name which represents the COMDAT key.  All global objects which
+Comdats have a name which represents the COMDAT key.  All global objects that
 specify this key will only end up in the final object file if the linker chooses
 that key over some other key.  Aliases are placed in the same COMDAT that their
 aliasee computes to, if any.
@@ -763,17 +776,26 @@ the COMDAT key's section is the largest:
 .. code-block:: llvm
 
    $foo = comdat largest
-   @foo = global i32 2, comdat $foo
+   @foo = global i32 2, comdat($foo)
 
-   define void @bar() comdat $foo {
+   define void @bar() comdat($foo) {
      ret void
    }
 
+As a syntactic sugar the ``$name`` can be omitted if the name is the same as
+the global name:
+
+.. code-block:: llvm
+
+  $foo = comdat any
+  @foo = global i32 2, comdat
+
+
 In a COFF object file, this will create a COMDAT section with selection kind
 ``IMAGE_COMDAT_SELECT_LARGEST`` containing the contents of the ``@foo`` symbol
 and another COMDAT section with selection kind
 ``IMAGE_COMDAT_SELECT_ASSOCIATIVE`` which is associated with the first COMDAT
-section and contains the contents of the ``@baz`` symbol.
+section and contains the contents of the ``@bar`` symbol.
 
 There are some restrictions on the properties of the global object.
 It, or an alias to it, must have the same name as the COMDAT group when
@@ -791,8 +813,8 @@ For example:
 
    $foo = comdat any
    $bar = comdat any
-   @g1 = global i32 42, section "sec", comdat $foo
-   @g2 = global i32 42, section "sec", comdat $bar
+   @g1 = global i32 42, section "sec", comdat($foo)
+   @g2 = global i32 42, section "sec", comdat($bar)
 
 From the object file perspective, this requires the creation of two sections
 with the same name.  This is necessary because both globals belong to different
@@ -815,9 +837,9 @@ operands for a named metadata.
 Syntax::
 
     ; Some unnamed metadata nodes, which are referenced by the named metadata.
-    !0 = metadata !{metadata !"zero"}
-    !1 = metadata !{metadata !"one"}
-    !2 = metadata !{metadata !"two"}
+    !0 = !{!"zero"}
+    !1 = !{!"one"}
+    !2 = !{!"two"}
     ; A named metadata.
     !name = !{!0, !1, !2}
 
@@ -921,26 +943,36 @@ Currently, only the following parameter attributes are defined:
     the first parameter. This is not a valid attribute for return
     values.
 
+``align <n>``
+    This indicates that the pointer value may be assumed by the optimizer to
+    have the specified alignment.
+
+    Note that this attribute has additional semantics when combined with the
+    ``byval`` attribute.
+
 .. _noalias:
 
 ``noalias``
-    This indicates that pointer values :ref:`based <pointeraliasing>` on
-    the argument or return value do not alias pointer values which are
-    not *based* on it, ignoring certain "irrelevant" dependencies. For a
-    call to the parent function, dependencies between memory references
-    from before or after the call and from those during the call are
-    "irrelevant" to the ``noalias`` keyword for the arguments and return
-    value used in that call. The caller shares the responsibility with
-    the callee for ensuring that these requirements are met. For further
-    details, please see the discussion of the NoAlias response in :ref:`alias
-    analysis <Must, May, or No>`.
+    This indicates that objects accessed via pointer values
+    :ref:`based <pointeraliasing>` on the argument or return value are not also
+    accessed, during the execution of the function, via pointer values not
+    *based* on the argument or return value. The attribute on a return value
+    also has additional semantics described below. The caller shares the
+    responsibility with the callee for ensuring that these requirements are met.
+    For further details, please see the discussion of the NoAlias response in
+    :ref:`alias analysis <Must, May, or No>`.
 
     Note that this definition of ``noalias`` is intentionally similar
-    to the definition of ``restrict`` in C99 for function arguments,
-    though it is slightly weaker.
+    to the definition of ``restrict`` in C99 for function arguments.
 
     For function return values, C99's ``restrict`` is not meaningful,
-    while LLVM's ``noalias`` is.
+    while LLVM's ``noalias`` is. Furthermore, the semantics of the ``noalias``
+    attribute on return values are stronger than the semantics of the attribute
+    when used on function arguments. On function return values, the ``noalias``
+    attribute indicates that the function acts like a system memory allocation
+    function, returning a pointer to allocated storage disjoint from the
+    storage for any other object accessible to the caller.
+
 ``nocapture``
     This indicates that the callee does not make any copies of the
     pointer that outlive the callee itself. This is not a valid
@@ -993,7 +1025,7 @@ string:
     define void @f() gc "name" { ... }
 
 The compiler declares the supported values of *name*. Specifying a
-collector which will cause the compiler to alter its output in order to
+collector will cause the compiler to alter its output in order to
 support the named garbage collection algorithm.
 
 .. _prefixdata:
@@ -1001,47 +1033,79 @@ support the named garbage collection algorithm.
 Prefix Data
 -----------
 
-Prefix data is data associated with a function which the code generator
-will emit immediately before the function body.  The purpose of this feature
-is to allow frontends to associate language-specific runtime metadata with
-specific functions and make it available through the function pointer while
-still allowing the function pointer to be called.  To access the data for a
-given function, a program may bitcast the function pointer to a pointer to
-the constant's type.  This implies that the IR symbol points to the start
-of the prefix data.
+Prefix data is data associated with a function which the code
+generator will emit immediately before the function's entrypoint.
+The purpose of this feature is to allow frontends to associate
+language-specific runtime metadata with specific functions and make it
+available through the function pointer while still allowing the
+function pointer to be called.
+
+To access the data for a given function, a program may bitcast the
+function pointer to a pointer to the constant's type and dereference
+index -1.  This implies that the IR symbol points just past the end of
+the prefix data. For instance, take the example of a function annotated
+with a single ``i32``,
+
+.. code-block:: llvm
+
+    define void @f() prefix i32 123 { ... }
+
+The prefix data can be referenced as,
 
-To maintain the semantics of ordinary function calls, the prefix data must
+.. code-block:: llvm
+
+    %0 = bitcast *void () @f to *i32
+    %a = getelementptr inbounds *i32 %0, i32 -1
+    %b = load i32* %a
+
+Prefix data is laid out as if it were an initializer for a global variable
+of the prefix data's type.  The function will be placed such that the
+beginning of the prefix data is aligned. This means that if the size
+of the prefix data is not a multiple of the alignment size, the
+function's entrypoint will not be aligned. If alignment of the
+function's entrypoint is desired, padding must be added to the prefix
+data.
+
+A function may have prefix data but no body.  This has similar semantics
+to the ``available_externally`` linkage in that the data may be used by the
+optimizers but will not be emitted in the object file.
+
+.. _prologuedata:
+
+Prologue Data
+-------------
+
+The ``prologue`` attribute allows arbitrary code (encoded as bytes) to
+be inserted prior to the function body. This can be used for enabling
+function hot-patching and instrumentation.
+
+To maintain the semantics of ordinary function calls, the prologue data must
 have a particular format.  Specifically, it must begin with a sequence of
 bytes which decode to a sequence of machine instructions, valid for the
 module's target, which transfer control to the point immediately succeeding
-the prefix data, without performing any other visible action.  This allows
+the prologue data, without performing any other visible action.  This allows
 the inliner and other passes to reason about the semantics of the function
-definition without needing to reason about the prefix data.  Obviously this
-makes the format of the prefix data highly target dependent.
+definition without needing to reason about the prologue data.  Obviously this
+makes the format of the prologue data highly target dependent.
 
-Prefix data is laid out as if it were an initializer for a global variable
-of the prefix data's type.  No padding is automatically placed between the
-prefix data and the function body.  If padding is required, it must be part
-of the prefix data.
-
-A trivial example of valid prefix data for the x86 architecture is ``i8 144``,
+A trivial example of valid prologue data for the x86 architecture is ``i8 144``,
 which encodes the ``nop`` instruction:
 
 .. code-block:: llvm
 
-    define void @f() prefix i8 144 { ... }
+    define void @f() prologue i8 144 { ... }
 
-Generally prefix data can be formed by encoding a relative branch instruction
-which skips the metadata, as in this example of valid prefix data for the
+Generally prologue data can be formed by encoding a relative branch instruction
+which skips the metadata, as in this example of valid prologue data for the
 x86_64 architecture, where the first two bytes encode ``jmp .+10``:
 
 .. code-block:: llvm
 
     %0 = type <{ i8, i8, i8* }>
 
-    define void @f() prefix %0 <{ i8 235, i8 8, i8* @md}> { ... }
+    define void @f() prologue %0 <{ i8 235, i8 8, i8* @md}> { ... }
 
-A function may have prefix data but no body.  This has similar semantics
+A function may have prologue data but no body.  This has similar semantics
 to the ``available_externally`` linkage in that the data may be used by the
 optimizers but will not be emitted in the object file.
 
@@ -1109,7 +1173,7 @@ example:
     This indicates that the callee function at a call site should be
     recognized as a built-in function, even though the function's declaration
     uses the ``nobuiltin`` attribute. This is only valid at call sites for
-    direct calls to functions which are declared with the ``nobuiltin``
+    direct calls to functions that are declared with the ``nobuiltin``
     attribute.
 ``cold``
     This attribute indicates that this function is rarely called. When
@@ -1604,7 +1668,7 @@ Given that definition, R\ :sub:`byte` is defined as follows:
 
 -  If R is volatile, the result is target-dependent. (Volatile is
    supposed to give guarantees which can support ``sig_atomic_t`` in
-   C/C++, and may be used for accesses to addresses which do not behave
+   C/C++, and may be used for accesses to addresses that do not behave
    like normal memory. It does not generally provide cross-thread
    synchronization.)
 -  Otherwise, if there is no write to the same byte that happens before
@@ -1692,7 +1756,7 @@ For a simpler introduction to the ordering constraints, see the
     address. This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
 ``seq_cst`` (sequentially consistent)
     In addition to the guarantees of ``acq_rel`` (``acquire`` for an
-    operation which only reads, ``release`` for an operation which only
+    operation that only reads, ``release`` for an operation that only
     writes), there is a global total order on all
     sequentially-consistent operations on all addresses, which is
     consistent with the *happens-before* partial order and with the
@@ -1741,6 +1805,52 @@ otherwise unsafe floating point operations
    dramatically change results in floating point (e.g. reassociate). This
    flag implies all the others.
 
+.. _uselistorder:
+
+Use-list Order Directives
+-------------------------
+
+Use-list directives encode the in-memory order of each use-list, allowing the
+order to be recreated.  ``<order-indexes>`` is a comma-separated list of
+indexes that are assigned to the referenced value's uses.  The referenced
+value's use-list is immediately sorted by these indexes.
+
+Use-list directives may appear at function scope or global scope.  They are not
+instructions, and have no effect on the semantics of the IR.  When they're at
+function scope, they must appear after the terminator of the final basic block.
+
+If basic blocks have their address taken via ``blockaddress()`` expressions,
+``uselistorder_bb`` can be used to reorder their use-lists from outside their
+function's scope.
+
+:Syntax:
+
+::
+
+    uselistorder <ty> <value>, { <order-indexes> }
+    uselistorder_bb @function, %block { <order-indexes> }
+
+:Examples:
+
+::
+
+    define void @foo(i32 %arg1, i32 %arg2) {
+    entry:
+      ; ... instructions ...
+    bb:
+      ; ... instructions ...
+
+      ; At function scope.
+      uselistorder i32 %arg1, { 1, 0, 2 }
+      uselistorder label %bb, { 1, 0 }
+    }
+
+    ; At global scope.
+    uselistorder i32* @global, { 1, 2, 0 }
+    uselistorder i32 7, { 1, 0 }
+    uselistorder i32 (i32) @bar, { 1, 0 }
+    uselistorder_bb @foo, %bb, { 5, 1, 3, 2, 0, 4 }
+
 .. _typesystem:
 
 Type System
@@ -1959,8 +2069,8 @@ type. Vector types are considered :ref:`first class <t_firstclass>`.
       < <# elements> x <elementtype> >
 
 The number of elements is a constant integer value larger than 0;
-elementtype may be any integer or floating point type, or a pointer to
-these types. Vectors of size zero are not allowed.
+elementtype may be any integer, floating point or pointer type. Vectors
+of size zero are not allowed.
 
 :Examples:
 
@@ -2213,7 +2323,9 @@ constants and smaller complex constants.
     square brackets (``[]``)). For example:
     "``[ i32 42, i32 11, i32 74 ]``". Array constants must have
     :ref:`array type <t_array>`, and the number and types of elements must
-    match those specified by the type.
+    match those specified by the type. As a special case, character array
+    constants may also be represented as a double-quoted string using the ``c``
+    prefix. For example: "``c"Hello World\0A\00"``".
 **Vector constants**
     Vector constants are represented with notation similar to vector
     type definitions (a comma separated list of elements, surrounded by
@@ -2228,11 +2340,11 @@ constants and smaller complex constants.
     having to print large zero initializers (e.g. for large arrays) and
     is always exactly equivalent to using explicit zero initializers.
 **Metadata node**
-    A metadata node is a structure-like constant with :ref:`metadata
-    type <t_metadata>`. For example:
-    "``metadata !{ i32 0, metadata !"test" }``". Unlike other
-    constants that are meant to be interpreted as part of the
-    instruction stream, metadata is a place to attach additional
+    A metadata node is a constant tuple without types.  For example:
+    "``!{!0, !{!2, !0}, !"test"}``".  Metadata can reference constant values,
+    for example: "``!{!0, i32 0, i8* @global, i64 (i64)* @function, !"str"}``".
+    Unlike other typed constants that are meant to be interpreted as part of
+    the instruction stream, metadata is a place to attach additional
     information such as debug info.
 
 Global Variable and Function Addresses
@@ -2330,7 +2442,7 @@ allowed to assume that the '``undef``' operand could be the same as
       %C = xor %B, %B
 
       %D = undef
-      %E = icmp lt %D, 4
+      %E = icmp slt %D, 4
       %F = icmp gte %D, 4
 
     Safe:
@@ -2395,8 +2507,8 @@ Poison Values
 
 Poison values are similar to :ref:`undef values <undefvalues>`, however
 they also represent the fact that an instruction or constant expression
-which cannot evoke side effects has nevertheless detected a condition
-which results in undefined behavior.
+that cannot evoke side effects has nevertheless detected a condition
+that results in undefined behavior.
 
 There is currently no way of representing a poison value in the IR; they
 only exist when produced by operations such as :ref:`add <i_add>` with
@@ -2433,8 +2545,8 @@ Poison value behavior is defined in terms of value *dependence*:
    successor.
 -  Dependence is transitive.
 
-Poison Values have the same behavior as :ref:`undef values <undefvalues>`,
-with the additional affect that any instruction which has a *dependence*
+Poison values have the same behavior as :ref:`undef values <undefvalues>`,
+with the additional effect that any instruction that has a *dependence*
 on a poison value has undefined behavior.
 
 Here are some examples:
@@ -2706,15 +2818,21 @@ occurs on.
 
 .. _metadata:
 
-Metadata Nodes and Metadata Strings
------------------------------------
+Metadata
+========
 
 LLVM IR allows metadata to be attached to instructions in the program
 that can convey extra information about the code to the optimizers and
 code generator. One example application of metadata is source-level
 debug information. There are two metadata primitives: strings and nodes.
-All metadata has the ``metadata`` type and is identified in syntax by a
-preceding exclamation point ('``!``').
+
+Metadata does not have a type, and is not a value.  If referenced from a
+``call`` instruction, it uses the ``metadata`` type.
+
+All metadata are identified in syntax by a exclamation point ('``!``').
+
+Metadata Nodes and Metadata Strings
+-----------------------------------
 
 A metadata string is a string surrounded by double quotes. It can
 contain any character by escaping non-printable characters with
@@ -2728,7 +2846,17 @@ their operand. For example:
 
 .. code-block:: llvm
 
-    !{ metadata !"test\00", i32 10}
+    !{ !"test\00", i32 10}
+
+Metadata nodes that aren't uniqued use the ``distinct`` keyword. For example:
+
+.. code-block:: llvm
+
+    !0 = distinct !{!"test\00", i32 10}
+
+``distinct`` nodes are useful when nodes shouldn't be merged based on their
+content.  They can also occur when transformations cause uniquing collisions
+when metadata operands change.
 
 A :ref:`named metadata <namedmetadatastructure>` is a collection of
 metadata nodes, which can be looked up in the module symbol table. For
@@ -2736,7 +2864,7 @@ example:
 
 .. code-block:: llvm
 
-    !foo =  metadata !{!4, !3}
+    !foo = !{!4, !3}
 
 Metadata can be used as function arguments. Here ``llvm.dbg.value``
 function is using two metadata arguments:
@@ -2755,6 +2883,23 @@ attached to the ``add`` instruction using the ``!dbg`` identifier:
 More information about specific metadata nodes recognized by the
 optimizers and code generator is found below.
 
+Specialized Metadata Nodes
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Specialized metadata nodes are custom data structures in metadata (as opposed
+to generic tuples).  Their fields are labelled, and can be specified in any
+order.
+
+MDLocation
+""""""""""
+
+``MDLocation`` nodes represent source debug locations.  The ``scope:`` field is
+mandatory.
+
+.. code-block:: llvm
+
+    !0 = !MDLocation(line: 2900, column: 42, scope: !1, inlinedAt: !2)
+
 '``tbaa``' Metadata
 ^^^^^^^^^^^^^^^^^^^
 
@@ -2769,10 +2914,10 @@ to three fields, e.g.:
 
 .. code-block:: llvm
 
-    !0 = metadata !{ metadata !"an example type tree" }
-    !1 = metadata !{ metadata !"int", metadata !0 }
-    !2 = metadata !{ metadata !"float", metadata !0 }
-    !3 = metadata !{ metadata !"const float", metadata !2, i64 1 }
+    !0 = !{ !"an example type tree" }
+    !1 = !{ !"int", !0 }
+    !2 = !{ !"float", !0 }
+    !3 = !{ !"const float", !2, i64 1 }
 
 The first field is an identity field. It can be any value, usually a
 metadata string, which uniquely identifies the type. The most important
@@ -2812,7 +2957,7 @@ its tbaa tag. e.g.:
 
 .. code-block:: llvm
 
-    !4 = metadata !{ i64 0, i64 4, metadata !1, i64 8, i64 4, metadata !2 }
+    !4 = !{ i64 0, i64 4, !1, i64 8, i64 4, !2 }
 
 This describes a struct with two fields. The first is at offset 0 bytes
 with size 4 bytes, and has tbaa tag !1. The second is at offset 8 bytes
@@ -2822,6 +2967,67 @@ Note that the fields need not be contiguous. In this example, there is a
 4 byte gap between the two fields. This gap represents padding which
 does not carry useful data and need not be preserved.
 
+'``noalias``' and '``alias.scope``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``noalias`` and ``alias.scope`` metadata provide the ability to specify generic
+noalias memory-access sets. This means that some collection of memory access
+instructions (loads, stores, memory-accessing calls, etc.) that carry
+``noalias`` metadata can specifically be specified not to alias with some other
+collection of memory access instructions that carry ``alias.scope`` metadata.
+Each type of metadata specifies a list of scopes where each scope has an id and
+a domain. When evaluating an aliasing query, if for some some domain, the set
+of scopes with that domain in one instruction's ``alias.scope`` list is a
+subset of (or qual to) the set of scopes for that domain in another
+instruction's ``noalias`` list, then the two memory accesses are assumed not to
+alias.
+
+The metadata identifying each domain is itself a list containing one or two
+entries. The first entry is the name of the domain. Note that if the name is a
+string then it can be combined accross functions and translation units. A
+self-reference can be used to create globally unique domain names. A
+descriptive string may optionally be provided as a second list entry.
+
+The metadata identifying each scope is also itself a list containing two or
+three entries. The first entry is the name of the scope. Note that if the name
+is a string then it can be combined accross functions and translation units. A
+self-reference can be used to create globally unique scope names. A metadata
+reference to the scope's domain is the second entry. A descriptive string may
+optionally be provided as a third list entry.
+
+For example,
+
+.. code-block:: llvm
+
+    ; Two scope domains:
+    !0 = !{!0}
+    !1 = !{!1}
+
+    ; Some scopes in these domains:
+    !2 = !{!2, !0}
+    !3 = !{!3, !0}
+    !4 = !{!4, !1}
+
+    ; Some scope lists:
+    !5 = !{!4} ; A list containing only scope !4
+    !6 = !{!4, !3, !2}
+    !7 = !{!3}
+
+    ; These two instructions don't alias:
+    %0 = load float* %c, align 4, !alias.scope !5
+    store float %0, float* %arrayidx.i, align 4, !noalias !5
+
+    ; These two instructions also don't alias (for domain !1, the set of scopes
+    ; in the !alias.scope equals that in the !noalias list):
+    %2 = load float* %c, align 4, !alias.scope !5
+    store float %2, float* %arrayidx.i2, align 4, !noalias !6
+
+    ; These two instructions don't alias (for domain !0, the set of scopes in
+    ; the !noalias list is not a superset of, or equal to, the scopes in the
+    ; !alias.scope list):
+    %2 = load float* %c, align 4, !alias.scope !6
+    store float %0, float* %arrayidx.i, align 4, !noalias !7
+
 '``fpmath``' Metadata
 ^^^^^^^^^^^^^^^^^^^^^
 
@@ -2842,7 +3048,7 @@ number representing the maximum relative error, for example:
 
 .. code-block:: llvm
 
-    !0 = metadata !{ float 2.5 } ; maximum acceptable inaccuracy is 2.5 ULPs
+    !0 = !{ float 2.5 } ; maximum acceptable inaccuracy is 2.5 ULPs
 
 '``range``' Metadata
 ^^^^^^^^^^^^^^^^^^^^
@@ -2874,10 +3080,10 @@ Examples:
       %d = invoke i8 @bar() to label %cont
              unwind label %lpad, !range !3 ; Can only be -2, -1, 3, 4 or 5
     ...
-    !0 = metadata !{ i8 0, i8 2 }
-    !1 = metadata !{ i8 255, i8 2 }
-    !2 = metadata !{ i8 0, i8 2, i8 3, i8 6 }
-    !3 = metadata !{ i8 -2, i8 0, i8 3, i8 6 }
+    !0 = !{ i8 0, i8 2 }
+    !1 = !{ i8 255, i8 2 }
+    !2 = !{ i8 0, i8 2, i8 3, i8 6 }
+    !3 = !{ i8 -2, i8 0, i8 3, i8 6 }
 
 '``llvm.loop``'
 ^^^^^^^^^^^^^^^
@@ -2897,20 +3103,20 @@ constructs:
 
 .. code-block:: llvm
 
-    !0 = metadata !{ metadata !0 }
-    !1 = metadata !{ metadata !1 }
+    !0 = !{!0}
+    !1 = !{!1}
 
 The loop identifier metadata can be used to specify additional
 per-loop metadata. Any operands after the first operand can be treated
-as user-defined metadata. For example the ``llvm.loop.interleave.count``
-suggests an interleave factor to the loop interleaver:
+as user-defined metadata. For example the ``llvm.loop.unroll.count``
+suggests an unroll factor to the loop unroller:
 
 .. code-block:: llvm
 
       br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
     ...
-    !0 = metadata !{ metadata !0, metadata !1 }
-    !1 = metadata !{ metadata !"llvm.loop.interleave.count", i32 4 }
+    !0 = !{!0, !1}
+    !1 = !{!"llvm.loop.unroll.count", i32 4}
 
 '``llvm.loop.vectorize``' and '``llvm.loop.interleave``'
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -2935,7 +3141,7 @@ example:
 
 .. code-block:: llvm
 
-   !0 = metadata !{ metadata !"llvm.loop.interleave.count", i32 4 }
+   !0 = !{!"llvm.loop.interleave.count", i32 4}
 
 Note that setting ``llvm.loop.interleave.count`` to 1 disables interleaving
 multiple iterations of the loop.  If ``llvm.loop.interleave.count`` is set to 0
@@ -2951,8 +3157,8 @@ is a bit.  If the bit operand value is 1 vectorization is enabled. A value of
 
 .. code-block:: llvm
 
-   !0 = metadata !{ metadata !"llvm.loop.vectorize.enable", i1 0 }
-   !1 = metadata !{ metadata !"llvm.loop.vectorize.enable", i1 1 }
+   !0 = !{!"llvm.loop.vectorize.enable", i1 0}
+   !1 = !{!"llvm.loop.vectorize.enable", i1 1}
 
 '``llvm.loop.vectorize.width``' Metadata
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -2963,13 +3169,59 @@ operand is an integer specifying the width. For example:
 
 .. code-block:: llvm
 
-   !0 = metadata !{ metadata !"llvm.loop.vectorize.width", i32 4 }
+   !0 = !{!"llvm.loop.vectorize.width", i32 4}
 
 Note that setting ``llvm.loop.vectorize.width`` to 1 disables
 vectorization of the loop.  If ``llvm.loop.vectorize.width`` is set to
 0 or if the loop does not have this metadata the width will be
 determined automatically.
 
+'``llvm.loop.unroll``'
+^^^^^^^^^^^^^^^^^^^^^^
+
+Metadata prefixed with ``llvm.loop.unroll`` are loop unrolling
+optimization hints such as the unroll factor. ``llvm.loop.unroll``
+metadata should be used in conjunction with ``llvm.loop`` loop
+identification metadata. The ``llvm.loop.unroll`` metadata are only
+optimization hints and the unrolling will only be performed if the
+optimizer believes it is safe to do so.
+
+'``llvm.loop.unroll.count``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata suggests an unroll factor to the loop unroller. The
+first operand is the string ``llvm.loop.unroll.count`` and the second
+operand is a positive integer specifying the unroll factor. For
+example:
+
+.. code-block:: llvm
+
+   !0 = !{!"llvm.loop.unroll.count", i32 4}
+
+If the trip count of the loop is less than the unroll count the loop
+will be partially unrolled.
+
+'``llvm.loop.unroll.disable``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata either disables loop unrolling. The metadata has a single operand
+which is the string ``llvm.loop.unroll.disable``.  For example:
+
+.. code-block:: llvm
+
+   !0 = !{!"llvm.loop.unroll.disable"}
+
+'``llvm.loop.unroll.full``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata either suggests that the loop should be unrolled fully. The
+metadata has a single operand which is the string ``llvm.loop.unroll.disable``.
+For example:
+
+.. code-block:: llvm
+
+   !0 = !{!"llvm.loop.unroll.full"}
+
 '``llvm.mem``'
 ^^^^^^^^^^^^^^^
 
@@ -3019,7 +3271,7 @@ metadata types that refer to the same loop identifier metadata.
 
    for.end:
    ...
-   !0 = metadata !{ metadata !0 }
+   !0 = !{!0}
 
 It is also possible to have nested parallel loops. In that case the
 memory accesses refer to a list of loop identifier metadata nodes instead of
@@ -3049,9 +3301,9 @@ the loop identifier metadata node directly:
 
    outer.for.end:                                          ; preds = %for.body
    ...
-   !0 = metadata !{ metadata !1, metadata !2 } ; a list of loop identifiers
-   !1 = metadata !{ metadata !1 } ; an identifier for the inner loop
-   !2 = metadata !{ metadata !2 } ; an identifier for the outer loop
+   !0 = !{!1, !2} ; a list of loop identifiers
+   !1 = !{!1} ; an identifier for the inner loop
+   !2 = !{!2} ; an identifier for the outer loop
 
 Module Flags Metadata
 =====================
@@ -3135,12 +3387,12 @@ An example of module flags:
 
 .. code-block:: llvm
 
-    !0 = metadata !{ i32 1, metadata !"foo", i32 1 }
-    !1 = metadata !{ i32 4, metadata !"bar", i32 37 }
-    !2 = metadata !{ i32 2, metadata !"qux", i32 42 }
-    !3 = metadata !{ i32 3, metadata !"qux",
-      metadata !{
-        metadata !"foo", i32 1
+    !0 = !{ i32 1, !"foo", i32 1 }
+    !1 = !{ i32 4, !"bar", i32 37 }
+    !2 = !{ i32 2, !"qux", i32 42 }
+    !3 = !{ i32 3, !"qux",
+      !{
+        !"foo", i32 1
       }
     }
     !llvm.module.flags = !{ !0, !1, !2, !3 }
@@ -3161,7 +3413,7 @@ An example of module flags:
 
    ::
 
-       metadata !{ metadata !"foo", i32 1 }
+       !{ !"foo", i32 1 }
 
    The behavior is to emit an error if the ``llvm.module.flags`` does not
    contain a flag with the ID ``!"foo"`` that has the value '1' after linking is
@@ -3237,10 +3489,10 @@ For example, the following metadata section specifies two separate sets of
 linker options, presumably to link against ``libz`` and the ``Cocoa``
 framework::
 
-    !0 = metadata !{ i32 6, metadata !"Linker Options",
-       metadata !{
-          metadata !{ metadata !"-lz" },
-          metadata !{ metadata !"-framework", metadata !"Cocoa" } } }
+    !0 = !{ i32 6, !"Linker Options",
+       !{
+          !{ !"-lz" },
+          !{ !"-framework", !"Cocoa" } } }
     !llvm.module.flags = !{ !0 }
 
 The metadata encoding as lists of lists of options, as opposed to a collapsed
@@ -3286,8 +3538,8 @@ compiled with a ``wchar_t`` width of 4 bytes, and the underlying type of an
 enum is the smallest type which can represent all of its values::
 
     !llvm.module.flags = !{!0, !1}
-    !0 = metadata !{i32 1, metadata !"short_wchar", i32 1}
-    !1 = metadata !{i32 1, metadata !"short_enum", i32 0}
+    !0 = !{i32 1, !"short_wchar", i32 1}
+    !1 = !{i32 1, !"short_enum", i32 0}
 
 .. _intrinsicglobalvariables:
 
@@ -4905,7 +5157,7 @@ Example:
 
       %agg1 = insertvalue {i32, float} undef, i32 1, 0              ; yields {i32 1, float undef}
       %agg2 = insertvalue {i32, float} %agg1, float %val, 1         ; yields {i32 1, float %val}
-      %agg3 = insertvalue {i32, {float}} %agg1, float %val, 1, 0    ; yields {i32 1, float %val}
+      %agg3 = insertvalue {i32, {float}} undef, float %val, 1, 0    ; yields {i32 undef, {float %val}}
 
 .. _memoryops:
 
@@ -4985,7 +5237,7 @@ Syntax:
 
 ::
 
-      <result> = load [volatile] <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>]
+      <result> = load [volatile] <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>][, !nonnull !<index>]
       <result> = load atomic [volatile] <ty>* <pointer> [singlethread] <ordering>, align <alignment>
       !<index> = !{ i32 1 }
 
@@ -5036,10 +5288,19 @@ as the ``MOVNT`` instruction on x86.
 The optional ``!invariant.load`` metadata must reference a single
 metadata name ``<index>`` corresponding to a metadata node with no
 entries. The existence of the ``!invariant.load`` metadata on the
-instruction tells the optimizer and code generator that this load
-address points to memory which does not change value during program
-execution. The optimizer may then move this load around, for example, by
-hoisting it out of loops using loop invariant code motion.
+instruction tells the optimizer and code generator that the address
+operand to this load points to memory which can be assumed unchanged.
+Being invariant does not imply that a location is dereferenceable, 
+but it does imply that once the location is known dereferenceable 
+its value is henceforth unchanging.  
+
+The optional ``!nonnull`` metadata must reference a single
+metadata name ``<index>`` corresponding to a metadata node with no
+entries. The existence of the ``!nonnull`` metadata on the
+instruction tells the optimizer that the value loaded is known to
+never be null.  This is analogous to the ''nonnull'' attribute
+on parameters and return values.  This metadata can only be applied 
+to loads of a pointer type.  
 
 Semantics:
 """"""""""
@@ -6421,6 +6682,9 @@ This instruction requires several arguments:
    - The calling conventions of the caller and callee must match.
    - All ABI-impacting function attributes, such as sret, byval, inreg,
      returned, and inalloca, must match.
+   - The callee must be varargs iff the caller is varargs. Bitcasting a
+     non-varargs function to the appropriate varargs type is legal so
+     long as the non-varargs prefixes obey the other rules.
 
    Tail call optimization for calls marked ``tail`` is guaranteed to occur if
    the following conditions are met:
@@ -6701,14 +6965,21 @@ variable argument handling intrinsic functions are used.
 
 .. code-block:: llvm
 
+    ; This struct is different for every platform. For most platforms,
+    ; it is merely an i8*.
+    %struct.va_list = type { i8* }
+
+    ; For Unix x86_64 platforms, va_list is the following struct:
+    ; %struct.va_list = type { i32, i32, i8*, i8* }
+
     define i32 @test(i32 %X, ...) {
       ; Initialize variable argument processing
-      %ap = alloca i8*
-      %ap2 = bitcast i8** %ap to i8*
+      %ap = alloca %struct.va_list
+      %ap2 = bitcast %struct.va_list* %ap to i8*
       call void @llvm.va_start(i8* %ap2)
 
       ; Read a single integer argument
-      %tmp = va_arg i8** %ap, i32
+      %tmp = va_arg i8* %ap2, i32
 
       ; Demonstrate usage of llvm.va_copy and llvm.va_end
       %aq = alloca i8*
@@ -7027,6 +7298,56 @@ Note that calling this intrinsic does not prevent function inlining or
 other aggressive transformations, so the value returned may not be that
 of the obvious source-language caller.
 
+'``llvm.frameallocate``' and '``llvm.framerecover``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare i8* @llvm.frameallocate(i32 %size)
+      declare i8* @llvm.framerecover(i8* %func, i8* %fp)
+
+Overview:
+"""""""""
+
+The '``llvm.frameallocate``' intrinsic allocates stack memory at some fixed
+offset from the frame pointer, and the '``llvm.framerecover``'
+intrinsic applies that offset to a live frame pointer to recover the address of
+the allocation. The offset is computed during frame layout of the caller of
+``llvm.frameallocate``.
+
+Arguments:
+""""""""""
+
+The ``size`` argument to '``llvm.frameallocate``' must be a constant integer
+indicating the amount of stack memory to allocate. As with allocas, allocating
+zero bytes is legal, but the result is undefined.
+
+The ``func`` argument to '``llvm.framerecover``' must be a constant
+bitcasted pointer to a function defined in the current module. The code
+generator cannot determine the frame allocation offset of functions defined in
+other modules.
+
+The ``fp`` argument to '``llvm.framerecover``' must be a frame
+pointer of a call frame that is currently live. The return value of
+'``llvm.frameaddress``' is one way to produce such a value, but most platforms
+also expose the frame pointer through stack unwinding mechanisms.
+
+Semantics:
+""""""""""
+
+These intrinsics allow a group of functions to access one stack memory
+allocation in an ancestor stack frame. The memory returned from
+'``llvm.frameallocate``' may be allocated prior to stack realignment, so the
+memory is only aligned to the ABI-required stack alignment.  Each function may
+only call '``llvm.frameallocate``' one or zero times from the function entry
+block.  The frame allocation intrinsic inhibits inlining, as any frame
+allocations in the inlined function frame are likely to be at a different
+offset from the one used by '``llvm.framerecover``' called with the
+uninlined function.
+
 .. _int_read_register:
 .. _int_write_register:
 
@@ -7042,7 +7363,7 @@ Syntax:
       declare i64 @llvm.read_register.i64(metadata)
       declare void @llvm.write_register.i32(metadata, i32 @value)
       declare void @llvm.write_register.i64(metadata, i64 @value)
-      !0 = metadata !{metadata !"sp\00"}
+      !0 = !{!"sp\00"}
 
 Overview:
 """""""""
@@ -7264,6 +7585,50 @@ time library.
 This instrinsic does *not* empty the instruction pipeline. Modifications
 of the current function are outside the scope of the intrinsic.
 
+'``llvm.instrprof_increment``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare void @llvm.instrprof_increment(i8* <name>, i64 <hash>,
+                                             i32 <num-counters>, i32 <index>)
+
+Overview:
+"""""""""
+
+The '``llvm.instrprof_increment``' intrinsic can be emitted by a
+frontend for use with instrumentation based profiling. These will be
+lowered by the ``-instrprof`` pass to generate execution counts of a
+program at runtime.
+
+Arguments:
+""""""""""
+
+The first argument is a pointer to a global variable containing the
+name of the entity being instrumented. This should generally be the
+(mangled) function name for a set of counters.
+
+The second argument is a hash value that can be used by the consumer
+of the profile data to detect changes to the instrumented source, and
+the third is the number of counters associated with ``name``. It is an
+error if ``hash`` or ``num-counters`` differ between two instances of
+``instrprof_increment`` that refer to the same name.
+
+The last argument refers to which of the counters for ``name`` should
+be incremented. It should be a value between 0 and ``num-counters``.
+
+Semantics:
+""""""""""
+
+This intrinsic represents an increment of a profiling counter. It will
+cause the ``-instrprof`` pass to generate the appropriate data
+structures and the code to increment the appropriate value, in a
+format that can be written out by a compiler runtime and consumed via
+the ``llvm-profdata`` tool.
+
 Standard C Library Intrinsics
 -----------------------------
 
@@ -7845,9 +8210,9 @@ all types however.
 
       declare float     @llvm.fabs.f32(float  %Val)
       declare double    @llvm.fabs.f64(double %Val)
-      declare x86_fp80  @llvm.fabs.f80(x86_fp80  %Val)
+      declare x86_fp80  @llvm.fabs.f80(x86_fp80 %Val)
       declare fp128     @llvm.fabs.f128(fp128 %Val)
-      declare ppc_fp128 @llvm.fabs.ppcf128(ppc_fp128  %Val)
+      declare ppc_fp128 @llvm.fabs.ppcf128(ppc_fp128 %Val)
 
 Overview:
 """""""""
@@ -7867,6 +8232,89 @@ Semantics:
 This function returns the same values as the libm ``fabs`` functions
 would, and handles error conditions in the same way.
 
+'``llvm.minnum.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.minnum`` on any
+floating point or vector of floating point type. Not all targets support
+all types however.
+
+::
+
+      declare float     @llvm.minnum.f32(float %Val0, float %Val1)
+      declare double    @llvm.minnum.f64(double %Val0, double %Val1)
+      declare x86_fp80  @llvm.minnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
+      declare fp128     @llvm.minnum.f128(fp128 %Val0, fp128 %Val1)
+      declare ppc_fp128 @llvm.minnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 %Val1)
+
+Overview:
+"""""""""
+
+The '``llvm.minnum.*``' intrinsics return the minimum of the two
+arguments.
+
+
+Arguments:
+""""""""""
+
+The arguments and return value are floating point numbers of the same
+type.
+
+Semantics:
+""""""""""
+
+Follows the IEEE-754 semantics for minNum, which also match for libm's
+fmin.
+
+If either operand is a NaN, returns the other non-NaN operand. Returns
+NaN only if both operands are NaN. If the operands compare equal,
+returns a value that compares equal to both operands. This means that
+fmin(+/-0.0, +/-0.0) could return either -0.0 or 0.0.
+
+'``llvm.maxnum.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.maxnum`` on any
+floating point or vector of floating point type. Not all targets support
+all types however.
+
+::
+
+      declare float     @llvm.maxnum.f32(float  %Val0, float  %Val1l)
+      declare double    @llvm.maxnum.f64(double %Val0, double %Val1)
+      declare x86_fp80  @llvm.maxnum.f80(x86_fp80  %Val0, x86_fp80  %Val1)
+      declare fp128     @llvm.maxnum.f128(fp128 %Val0, fp128 %Val1)
+      declare ppc_fp128 @llvm.maxnum.ppcf128(ppc_fp128  %Val0, ppc_fp128  %Val1)
+
+Overview:
+"""""""""
+
+The '``llvm.maxnum.*``' intrinsics return the maximum of the two
+arguments.
+
+
+Arguments:
+""""""""""
+
+The arguments and return value are floating point numbers of the same
+type.
+
+Semantics:
+""""""""""
+Follows the IEEE-754 semantics for maxNum, which also match for libm's
+fmax.
+
+If either operand is a NaN, returns the other non-NaN operand. Returns
+NaN only if both operands are NaN. If the operands compare equal,
+returns a value that compares equal to both operands. This means that
+fmax(+/-0.0, +/-0.0) could return either -0.0 or 0.0.
+
 '``llvm.copysign.*``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -8226,7 +8674,7 @@ Arguments:
 """"""""""
 
 The first argument is the value to be counted. This argument may be of
-any integer type, or a vectory with integer element type. The return
+any integer type, or a vector with integer element type. The return
 type must match the first argument type.
 
 The second argument must be a constant and is a flag to indicate whether
@@ -8273,7 +8721,7 @@ Arguments:
 """"""""""
 
 The first argument is the value to be counted. This argument may be of
-any integer type, or a vectory with integer element type. The return
+any integer type, or a vector with integer element type. The return
 type must match the first argument type.
 
 The second argument must be a constant and is a flag to indicate whether
@@ -8869,6 +9317,93 @@ intrinsic returns the executable address corresponding to ``tramp``
 after performing the required machine specific adjustments. The pointer
 returned can then be :ref:`bitcast and executed <int_trampoline>`.
 
+Masked Vector Load and Store Intrinsics
+---------------------------------------
+
+LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.
+
+.. _int_mload:
+
+'``llvm.masked.load.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. The loaded data is a vector of any integer or floating point data type.
+
+::
+
+      declare <16 x float> @llvm.masked.load.v16f32 (<16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
+      declare <2 x double> @llvm.masked.load.v2f64  (<2 x double>* <ptr>, i32 <alignment>, <2 x i1>  <mask>, <2 x double> <passthru>)
+
+Overview:
+"""""""""
+
+Reads a vector from memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes in the passthru operand.
+
+
+Arguments:
+""""""""""
+
+The first operand is the base pointer for the load. The second operand is the alignment of the source location. It must be a constant integer value. The third operand, mask, is a vector of boolean 'i1' values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the base pointer and the type of passthru operand are the same vector types.
+
+
+Semantics:
+""""""""""
+
+The '``llvm.masked.load``' intrinsic is designed for conditional reading of selected vector elements in a single IR operation. It is useful for targets that support vector masked loads and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar load operations.
+The result of this operation is equivalent to a regular vector load instruction followed by a 'select' between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to masked-off lanes.
+
+
+::
+
+       %res = call <16 x float> @llvm.masked.load.v16f32 (<16 x float>* %ptr, i32 4, <16 x i1>%mask, <16 x float> %passthru)
+       
+       ;; The result of the two following instructions is identical aside from potential memory access exception
+       %loadlal = load <16 x float>* %ptr, align 4
+       %res = select <16 x i1> %mask, <16 x float> %loadlal, <16 x float> %passthru
+
+.. _int_mstore:
+
+'``llvm.masked.store.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. The data stored in memory is a vector of any integer or floating point data type.
+
+::
+
+       declare void @llvm.masked.store.v8i32 (<8 x i32>  <value>, <8 x i32> * <ptr>, i32 <alignment>,  <8 x i1>  <mask>)
+       declare void @llvm.masked.store.v16f32(<16 x i32> <value>, <16 x i32>* <ptr>, i32 <alignment>,  <16 x i1> <mask>)
+
+Overview:
+"""""""""
+
+Writes a vector to memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.
+
+Arguments:
+""""""""""
+
+The first operand is the vector value to be written to memory. The second operand is the base pointer for the store, it has the same underlying type as the value operand. The third operand is the alignment of the destination location. The fourth operand, mask, is a vector of boolean values. The types of the mask and the value operand must have the same number of vector elements.
+
+
+Semantics:
+""""""""""
+
+The '``llvm.masked.store``' intrinsics is designed for conditional writing of selected vector elements in a single IR operation. It is useful for targets that support vector masked store and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
+The result of this operation is equivalent to a load-modify-store sequence. However, using this intrinsic prevents exceptions and data races on memory access to masked-off lanes.
+
+::
+
+       call void @llvm.masked.store.v16f32(<16 x float> %value, <16 x float>* %ptr, i32 4,  <16 x i1> %mask)
+       
+       ;; The result of the following instructions is identical aside from potential data races and memory access exceptions
+       %oldval = load <16 x float>* %ptr, align 4
+       %res = select <16 x i1> %mask, <16 x float> %value, <16 x float> %oldval
+       store <16 x float> %res, <16 x float>* %ptr, align 4
+
+
 Memory Use Markers
 ------------------
 
@@ -9313,6 +9848,46 @@ Semantics:
 
 This intrinsic is lowered to the ``val``.
 
+'``llvm.assume``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare void @llvm.assume(i1 %cond)
+
+Overview:
+"""""""""
+
+The ``llvm.assume`` allows the optimizer to assume that the provided
+condition is true. This information can then be used in simplifying other parts
+of the code.
+
+Arguments:
+""""""""""
+
+The condition which the optimizer may assume is always true.
+
+Semantics:
+""""""""""
+
+The intrinsic allows the optimizer to assume that the provided condition is
+always true whenever the control flow reaches the intrinsic call. No code is
+generated for this intrinsic, and instructions that contribute only to the
+provided condition are not used for code generation. If the condition is
+violated during execution, the behavior is undefined.
+
+Note that the optimizer might limit the transformations performed on values
+used by the ``llvm.assume`` intrinsic in order to preserve the instructions
+only used to form the intrinsic's input argument. This might prove undesirable
+if the extra information provided by the ``llvm.assume`` intrinsic does not cause
+sufficient overall improvement in code quality. For this reason,
+``llvm.assume`` should not be used to document basic mathematical invariants
+that the optimizer can otherwise deduce or facts that are of little use to the
+optimizer.
+
 '``llvm.donothing``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -9326,8 +9901,9 @@ Syntax:
 Overview:
 """""""""
 
-The ``llvm.donothing`` intrinsic doesn't perform any operation. It's the
-only intrinsic that can be called with an invoke instruction.
+The ``llvm.donothing`` intrinsic doesn't perform any operation. It's one of only
+two intrinsics (besides ``llvm.experimental.patchpoint``) that can be called
+with an invoke instruction.
 
 Arguments:
 """"""""""
diff --git a/docs/Lexicon.rst b/docs/Lexicon.rst
index 65b7a3e84cda..9a599da12859 100644
--- a/docs/Lexicon.rst
+++ b/docs/Lexicon.rst
@@ -133,6 +133,15 @@ M
 **MC**
     Machine Code
 
+N
+-
+
+**NFC**
+  "No functional change". Used in a commit message to indicate that a patch
+  is a pure refactoring/cleanup.
+  Usually used in the first line, so it is visible without opening the
+  actual commit email.
+
 O
 -
 .. _object pointer:
diff --git a/docs/LinkTimeOptimization.rst b/docs/LinkTimeOptimization.rst
index c15abd325ed0..55a7486874a3 100644
--- a/docs/LinkTimeOptimization.rst
+++ b/docs/LinkTimeOptimization.rst
@@ -134,7 +134,7 @@ Alternative Approaches
 Multi-phase communication between ``libLTO`` and linker
 =======================================================
 
-The linker collects information about symbol defininitions and uses in various
+The linker collects information about symbol definitions and uses in various
 link objects which is more accurate than any information collected by other
 tools during typical build cycles.  The linker collects this information by
 looking at the definitions and uses of symbols in native .o files and using
diff --git a/docs/MCJITDesignAndImplementation.rst b/docs/MCJITDesignAndImplementation.rst
index 2cb62964d465..237a5be52fb8 100644
--- a/docs/MCJITDesignAndImplementation.rst
+++ b/docs/MCJITDesignAndImplementation.rst
@@ -57,7 +57,7 @@ attempt to retrieve an object image from its ObjectCache member, if one
 has been set.  If a cached object image cannot be retrieved, MCJIT will
 call its emitObject method.  MCJIT::emitObject uses a local PassManager
 instance and creates a new ObjectBufferStream instance, both of which it
-passes to TargetManager::addPassesToEmitMC before calling PassManager::run
+passes to TargetMachine::addPassesToEmitMC before calling PassManager::run
 on the Module with which it was created.
 
 .. image:: MCJIT-load.png
diff --git a/docs/Makefile b/docs/Makefile
index d973af583e00..690f7726b732 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -1,10 +1,10 @@
 ##===- docs/Makefile ---------------------------------------*- Makefile -*-===##
-# 
+#
 #                     The LLVM Compiler Infrastructure
 #
 # This file is distributed under the University of Illinois Open Source
 # License. See LICENSE.TXT for details.
-# 
+#
 ##===----------------------------------------------------------------------===##
 
 LEVEL      := ..
@@ -121,7 +121,8 @@ regen-ocamldoc:
 	$(Verb) $(MKDIR) $(PROJ_OBJ_DIR)/ocamldoc/html
 	$(Verb) \
 		$(OCAMLDOC) -d $(PROJ_OBJ_DIR)/ocamldoc/html -sort -colorize-code -html \
-		`$(FIND) $(LEVEL)/bindings/ocaml -name "*.odoc" -exec echo -load '{}' ';'`
+		`$(FIND) $(LEVEL)/bindings/ocaml -name "*.odoc" \
+		         -path "*/$(BuildMode)/*.odoc" -exec echo -load '{}' ';'`
 
 uninstall-local::
 	$(Echo) Uninstalling Documentation
diff --git a/docs/MergeFunctions.rst b/docs/MergeFunctions.rst
new file mode 100644
index 000000000000..6b8012e4b0cc
--- /dev/null
+++ b/docs/MergeFunctions.rst
@@ -0,0 +1,802 @@
+=================================
+MergeFunctions pass, how it works
+=================================
+
+.. contents::
+   :local:
+
+Introduction
+============
+Sometimes code contains equal functions, or functions that does exactly the same
+thing even though they are non-equal on the IR level (e.g.: multiplication on 2
+and 'shl 1'). It could happen due to several reasons: mainly, the usage of
+templates and automatic code generators. Though, sometimes user itself could
+write the same thing twice :-)
+
+The main purpose of this pass is to recognize such functions and merge them.
+
+Why would I want to read this document?
+---------------------------------------
+Document is the extension to pass comments and describes the pass logic. It
+describes algorithm that is used in order to compare functions, it also
+explains how we could combine equal functions correctly, keeping module valid.
+
+Material is brought in top-down form, so reader could start learn pass from
+ideas and end up with low-level algorithm details, thus preparing him for
+reading the sources.
+
+So main goal is do describe algorithm and logic here; the concept. This document
+is good for you, if you *don't want* to read the source code, but want to
+understand pass algorithms. Author tried not to repeat the source-code and
+cover only common cases, and thus avoid cases when after minor code changes we
+need to update this document.
+
+
+What should I know to be able to follow along with this document?
+-----------------------------------------------------------------
+
+Reader should be familiar with common compile-engineering principles and LLVM
+code fundamentals. In this article we suppose reader is familiar with
+`Single Static Assingment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
+concepts. Understanding of
+`IR structure <http://llvm.org/docs/LangRef.html#high-level-structure>`_ is
+also important.
+
+We will use such terms as
+"`module <http://llvm.org/docs/LangRef.html#high-level-structure>`_",
+"`function <http://llvm.org/docs/ProgrammersManual.html#the-function-class>`_",
+"`basic block <http://en.wikipedia.org/wiki/Basic_block>`_",
+"`user <http://llvm.org/docs/ProgrammersManual.html#the-user-class>`_",
+"`value <http://llvm.org/docs/ProgrammersManual.html#the-value-class>`_",
+"`instruction <http://llvm.org/docs/ProgrammersManual.html#the-instruction-class>`_".
+
+As a good start point, Kaleidoscope tutorial could be used:
+
+:doc:`tutorial/index`
+
+Especially it's important to understand chapter 3 of tutorial:
+
+:doc:`tutorial/LangImpl3`
+
+Reader also should know how passes work in LLVM, he could use next article as a
+reference and start point here:
+
+:doc:`WritingAnLLVMPass`
+
+What else? Well perhaps reader also should have some experience in LLVM pass
+debugging and bug-fixing.
+
+What I gain by reading this document?
+-------------------------------------
+Main purpose is to provide reader with comfortable form of algorithms
+description, namely the human reading text. Since it could be hard to
+understand algorithm straight from the source code: pass uses some principles
+that have to be explained first.
+
+Author wishes to everybody to avoid case, when you read code from top to bottom
+again and again, and yet you don't understand why we implemented it that way.
+
+We hope that after this article reader could easily debug and improve
+MergeFunctions pass and thus help LLVM project.
+
+Narrative structure
+-------------------
+Article consists of three parts. First part explains pass functionality on the
+top-level. Second part describes the comparison procedure itself. The third
+part describes the merging process.
+
+In every part author also tried to put the contents into the top-down form.
+First, the top-level methods will be described, while the terminal ones will be
+at the end, in the tail of each part. If reader will see the reference to the
+method that wasn't described yet, he will find its description a bit below.
+
+Basics
+======
+
+How to do it?
+-------------
+Do we need to merge functions? Obvious thing is: yes that's a quite possible
+case, since usually we *do* have duplicates. And it would be good to get rid of
+them. But how to detect such a duplicates? The idea is next: we split functions
+onto small bricks (parts), then we compare "bricks" amount, and if it equal,
+compare "bricks" themselves, and then do our conclusions about functions
+themselves.
+
+What the difference it could be? For example, on machine with 64-bit pointers
+(let's assume we have only one address space),  one function stores 64-bit
+integer, while another one stores a pointer. So if the target is a machine
+mentioned above, and if functions are identical, except the parameter type (we
+could consider it as a part of function type), then we can treat ``uint64_t``
+and``void*`` as equal.
+
+It was just an example; possible details are described a bit below.
+
+As another example reader may imagine two more functions. First function
+performs multiplication on 2, while the second one performs arithmetic right
+shift on 1.
+
+Possible solutions
+^^^^^^^^^^^^^^^^^^
+Let's briefly consider possible options about how and what we have to implement
+in order to create full-featured functions merging, and also what it would
+meant for us.
+
+Equal functions detection, obviously supposes "detector" method to be
+implemented, latter should answer the question "whether functions are equal".
+This "detector" method consists of tiny "sub-detectors", each of them answers
+exactly the same question, but for function parts.
+
+As the second step, we should merge equal functions. So it should be a "merger"
+method. "Merger" accepts two functions *F1* and *F2*, and produces *F1F2*
+function, the result of merging.
+
+Having such a routines in our hands, we can process whole module, and merge all
+equal functions.
+
+In this case, we have to compare every function with every another function. As
+reader could notice, this way seems to be quite expensive. Of course we could
+introduce hashing and other helpers, but it is still just an optimization, and
+thus the level of O(N*N) complexity.
+
+Can we reach another level? Could we introduce logarithmical search, or random
+access lookup? The answer is: "yes".
+
+Random-access
+"""""""""""""
+How it could be done? Just convert each function to number, and gather all of
+them in special hash-table. Functions with equal hash are equal. Good hashing
+means, that every function part must be taken into account. That means we have
+to convert every function part into some number, and then add it into hash.
+Lookup-up time would be small, but such approach adds some delay due to hashing
+routine.
+
+Logarithmical search
+""""""""""""""""""""
+We could introduce total ordering among the functions set, once we had it we
+could then implement a logarithmical search. Lookup time still depends on N,
+but adds a little of delay (*log(N)*).
+
+Present state
+"""""""""""""
+Both of approaches (random-access and logarithmical) has been implemented and
+tested. And both of them gave a very good improvement. And what was most
+surprising, logarithmical search was faster; sometimes up to 15%. Hashing needs
+some extra CPU time, and it is the main reason why it works slower; in most of
+cases total "hashing" time was greater than total "logarithmical-search" time.
+
+So, preference has been granted to the "logarithmical search".
+
+Though in the case of need, *logarithmical-search* (read "total-ordering") could
+be used as a milestone on our way to the *random-access* implementation.
+
+Every comparison is based either on the numbers or on flags comparison. In
+*random-access* approach we could use the same comparison algorithm. During
+comparison we exit once we find the difference, but here we might have to scan
+whole function body every time (note, it could be slower). Like in
+"total-ordering", we will track every numbers and flags, but instead of
+comparison, we should get numbers sequence and then create the hash number. So,
+once again, *total-ordering* could be considered as a milestone for even faster
+(in theory) random-access approach.
+
+MergeFunctions, main fields and runOnModule
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+There are two most important fields in class:
+
+``FnTree``  – the set of all unique functions. It keeps items that couldn't be
+merged with each other. It is defined as:
+
+``std::set<FunctionNode> FnTree;``
+
+Here ``FunctionNode`` is a wrapper for ``llvm::Function`` class, with
+implemented “<” operator among the functions set (below we explain how it works
+exactly; this is a key point in fast functions comparison).
+
+``Deferred`` – merging process can affect bodies of functions that are in
+``FnTree`` already. Obviously such functions should be rechecked again. In this
+case we remove them from ``FnTree``, and mark them as to be rescanned, namely
+put them into ``Deferred`` list.
+
+runOnModule
+"""""""""""
+The algorithm is pretty simple:
+
+1. Put all module's functions into the *worklist*.
+
+2. Scan *worklist*'s functions twice: first enumerate only strong functions and
+then only weak ones:
+
+   2.1. Loop body: take function from *worklist*  (call it *FCur*) and try to
+   insert it into *FnTree*: check whether *FCur* is equal to one of functions
+   in *FnTree*. If there *is* equal function in *FnTree* (call it *FExists*):
+   merge function *FCur* with *FExists*. Otherwise add function from *worklist*
+   to *FnTree*.
+
+3. Once *worklist* scanning and merging operations is complete, check *Deferred*
+list. If it is not empty: refill *worklist* contents with *Deferred* list and
+do step 2 again, if *Deferred* is empty, then exit from method.
+
+Comparison and logarithmical search
+"""""""""""""""""""""""""""""""""""
+Let's recall our task: for every function *F* from module *M*, we have to find
+equal functions *F`* in shortest time, and merge them into the single function.
+
+Defining total ordering among the functions set allows to organize functions
+into the binary tree. The lookup procedure complexity would be estimated as
+O(log(N)) in this case. But how to define *total-ordering*?
+
+We have to introduce a single rule applicable to every pair of functions, and
+following this rule then evaluate which of them is greater. What kind of rule
+it could be? Let's declare it as "compare" method, that returns one of 3
+possible values:
+
+-1, left is *less* than right,
+
+0, left and right are *equal*,
+
+1, left is *greater* than right.
+
+Of course it means, that we have to maintain
+*strict and non-strict order relation properties*:
+
+* reflexivity (``a <= a``, ``a == a``, ``a >= a``),
+* antisymmetry (if ``a <= b`` and ``b <= a`` then ``a == b``),
+* transitivity (``a <= b`` and ``b <= c``, then ``a <= c``)
+* asymmetry (if ``a < b``, then ``a > b`` or ``a == b``).
+
+As it was mentioned before, comparison routine consists of
+"sub-comparison-routines", each of them also consists
+"sub-comparison-routines", and so on, finally it ends up with a primitives
+comparison.
+
+Below, we will use the next operations:
+
+#. ``cmpNumbers(number1, number2)`` is method that returns -1 if left is less
+   than right; 0, if left and right are equal; and 1 otherwise.
+
+#. ``cmpFlags(flag1, flag2)`` is hypothetical method that compares two flags.
+   The logic is the same as in ``cmpNumbers``, where ``true`` is 1, and
+   ``false`` is 0.
+
+The rest of article is based on *MergeFunctions.cpp* source code
+(*<llvm_dir>/lib/Transforms/IPO/MergeFunctions.cpp*). We would like to ask
+reader to keep this file open nearby, so we could use it as a reference for
+further explanations.
+
+Now we're ready to proceed to the next chapter and see how it works.
+
+Functions comparison
+====================
+At first, let's define how exactly we compare complex objects.
+
+Complex objects comparison (function, basic-block, etc) is mostly based on its
+sub-objects comparison results. So it is similar to the next "tree" objects
+comparison:
+
+#. For two trees *T1* and *T2* we perform *depth-first-traversal* and have
+   two sequences as a product: "*T1Items*" and "*T2Items*".
+
+#. Then compare chains "*T1Items*" and "*T2Items*" in
+   most-significant-item-first order. Result of items comparison would be the
+   result of *T1* and *T2* comparison itself.
+
+FunctionComparator::compare(void)
+---------------------------------
+Brief look at the source code tells us, that comparison starts in
+“``int FunctionComparator::compare(void)``” method.
+
+1. First parts to be compared are function's attributes and some properties that
+outsides “attributes” term, but still could make function different without
+changing its body. This part of comparison is usually done within simple
+*cmpNumbers* or *cmpFlags* operations (e.g.
+``cmpFlags(F1->hasGC(), F2->hasGC())``). Below is full list of function's
+properties to be compared on this stage:
+
+  * *Attributes* (those are returned by ``Function::getAttributes()``
+    method).
+
+  * *GC*, for equivalence, *RHS* and *LHS* should be both either without
+    *GC* or with the same one.
+
+  * *Section*, just like a *GC*: *RHS* and *LHS* should be defined in the
+    same section.
+
+  * *Variable arguments*. *LHS* and *RHS* should be both either with or
+    without *var-args*.
+
+  * *Calling convention* should be the same.
+
+2. Function type. Checked by ``FunctionComparator::cmpType(Type*, Type*)``
+method. It checks return type and parameters type; the method itself will be
+described later.
+
+3. Associate function formal parameters with each other. Then comparing function
+bodies, if we see the usage of *LHS*'s *i*-th argument in *LHS*'s body, then,
+we want to see usage of *RHS*'s *i*-th argument at the same place in *RHS*'s
+body, otherwise functions are different. On this stage we grant the preference
+to those we met later in function body (value we met first would be *less*).
+This is done by “``FunctionComparator::cmpValues(const Value*, const Value*)``”
+method (will be described a bit later).
+
+4. Function body comparison. As it written in method comments:
+
+“We do a CFG-ordered walk since the actual ordering of the blocks in the linked
+list is immaterial. Our walk starts at the entry block for both functions, then
+takes each block from each terminator in order. As an artifact, this also means
+that unreachable blocks are ignored.”
+
+So, using this walk we get BBs from *left* and *right* in the same order, and
+compare them by “``FunctionComparator::compare(const BasicBlock*, const
+BasicBlock*)``” method.
+
+We also associate BBs with each other, like we did it with function formal
+arguments (see ``cmpValues`` method below).
+
+FunctionComparator::cmpType
+---------------------------
+Consider how types comparison works.
+
+1. Coerce pointer to integer. If left type is a pointer, try to coerce it to the
+integer type. It could be done if its address space is 0, or if address spaces
+are ignored at all. Do the same thing for the right type.
+
+2. If left and right types are equal, return 0. Otherwise we need to give
+preference to one of them. So proceed to the next step.
+
+3. If types are of different kind (different type IDs). Return result of type
+IDs comparison, treating them as a numbers (use ``cmpNumbers`` operation).
+
+4. If types are vectors or integers, return result of their pointers comparison,
+comparing them as numbers.
+
+5. Check whether type ID belongs to the next group (call it equivalent-group):
+
+   * Void
+
+   * Float
+
+   * Double
+
+   * X86_FP80
+
+   * FP128
+
+   * PPC_FP128
+
+   * Label
+
+   * Metadata.
+
+   If ID belongs to group above, return 0. Since it's enough to see that
+   types has the same ``TypeID``. No additional information is required.
+
+6. Left and right are pointers. Return result of address space comparison
+(numbers comparison).
+
+7. Complex types (structures, arrays, etc.). Follow complex objects comparison
+technique (see the very first paragraph of this chapter). Both *left* and
+*right* are to be expanded and their element types will be checked the same
+way. If we get -1 or 1 on some stage, return it. Otherwise return 0.
+
+8. Steps 1-6 describe all the possible cases, if we passed steps 1-6 and didn't
+get any conclusions, then invoke ``llvm_unreachable``, since it's quite
+unexpectable case.
+
+cmpValues(const Value*, const Value*)
+-------------------------------------
+Method that compares local values.
+
+This method gives us an answer on a very curious quesion: whether we could treat
+local values as equal, and which value is greater otherwise. It's better to
+start from example:
+
+Consider situation when we're looking at the same place in left function "*FL*"
+and in right function "*FR*". And every part of *left* place is equal to the
+corresponding part of *right* place, and (!) both parts use *Value* instances,
+for example:
+
+.. code-block:: llvm
+
+   instr0 i32 %LV   ; left side, function FL
+   instr0 i32 %RV   ; right side, function FR
+
+So, now our conclusion depends on *Value* instances comparison.
+
+Main purpose of this method is to determine relation between such values.
+
+What we expect from equal functions? At the same place, in functions "*FL*" and
+"*FR*" we expect to see *equal* values, or values *defined* at the same place
+in "*FL*" and "*FR*".
+
+Consider small example here:
+
+.. code-block:: llvm
+
+  define void %f(i32 %pf0, i32 %pf1) {
+    instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123
+  }
+
+.. code-block:: llvm
+
+  define void %g(i32 %pg0, i32 %pg1) {
+    instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123
+  }
+
+In this example, *pf0* is associated with *pg0*, *pf1* is associated with *pg1*,
+and we also declare that *pf0* < *pf1*, and thus *pg0* < *pf1*.
+
+Instructions with opcode "*instr0*" would be *equal*, since their types and
+opcodes are equal, and values are *associated*.
+
+Instruction with opcode "*instr1*" from *f* is *greater* than instruction with
+opcode "*instr1*" from *g*; here we have equal types and opcodes, but "*pf1* is
+greater than "*pg0*".
+
+And instructions with opcode "*instr2*" are equal, because their opcodes and
+types are equal, and the same constant is used as a value.
+
+What we assiciate in cmpValues?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+* Function arguments. *i*-th argument from left function associated with
+  *i*-th argument from right function.
+* BasicBlock instances. In basic-block enumeration loop we associate *i*-th
+  BasicBlock from the left function with *i*-th BasicBlock from the right
+  function.
+* Instructions.
+* Instruction operands. Note, we can meet *Value* here we have never seen
+  before. In this case it is not a function argument, nor *BasicBlock*, nor
+  *Instruction*. It is global value. It is constant, since its the only
+  supposed global here. Method also compares:
+* Constants that are of the same type.
+* If right constant could be losslessly bit-casted to the left one, then we
+  also compare them.
+
+How to implement cmpValues?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+*Association* is a case of equality for us. We just treat such values as equal.
+But, in general, we need to implement antisymmetric relation. As it was
+mentioned above, to understand what is *less*, we can use order in which we
+meet values. If both of values has the same order in function (met at the same
+time), then treat values as *associated*. Otherwise – it depends on who was
+first.
+
+Every time we run top-level compare method, we initialize two identical maps
+(one for the left side, another one for the right side):
+
+``map<Value, int> sn_mapL, sn_mapR;``
+
+The key of the map is the *Value* itself, the *value* – is its order (call it
+*serial number*).
+
+To add value *V* we need to perform the next procedure:
+
+``sn_map.insert(std::make_pair(V, sn_map.size()));``
+
+For the first *Value*, map will return *0*, for second *Value* map will return
+*1*, and so on.
+
+Then we can check whether left and right values met at the same time with simple
+comparison:
+
+``cmpNumbers(sn_mapL[Left], sn_mapR[Right]);``
+
+Of course, we can combine insertion and comparison:
+
+.. code-block:: c++
+
+  std::pair<iterator, bool>
+    LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), RightRes
+    = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
+  return cmpNumbers(LeftRes.first->second, RightRes.first->second);
+
+Let's look, how whole method could be implemented.
+
+1. we have to start from the bad news. Consider function self and
+cross-referencing cases:
+
+.. code-block:: c++
+
+  // self-reference unsigned fact0(unsigned n) { return n > 1 ? n
+  * fact0(n-1) : 1; } unsigned fact1(unsigned n) { return n > 1 ? n *
+  fact1(n-1) : 1; }
+
+  // cross-reference unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0;
+  } unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; }
+
+..
+
+  This comparison has been implemented in initial *MergeFunctions* pass
+  version. But, unfortunately, it is not transitive. And this is the only case
+  we can't convert to less-equal-greater comparison. It is a seldom case, 4-5
+  functions of 10000 (checked on test-suite), and, we hope, reader would
+  forgive us for such a sacrifice in order to get the O(log(N)) pass time.
+
+2. If left/right *Value* is a constant, we have to compare them. Return 0 if it
+is the same constant, or use ``cmpConstants`` method otherwise.
+
+3. If left/right is *InlineAsm* instance. Return result of *Value* pointers
+comparison.
+
+4. Explicit association of *L* (left value) and *R*  (right value). We need to
+find out whether values met at the same time, and thus are *associated*. Or we
+need to put the rule: when we treat *L* < *R*. Now it is easy: just return
+result of numbers comparison:
+
+.. code-block:: c++
+
+   std::pair<iterator, bool>
+     LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())),
+     RightRes = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
+   if (LeftRes.first->second == RightRes.first->second) return 0;
+   if (LeftRes.first->second < RightRes.first->second) return -1;
+   return 1;
+
+Now when *cmpValues* returns 0, we can proceed comparison procedure. Otherwise,
+if we get (-1 or 1), we need to pass this result to the top level, and finish
+comparison procedure.
+
+cmpConstants
+------------
+Performs constants comparison as follows:
+
+1. Compare constant types using ``cmpType`` method. If result is -1 or 1, goto
+step 2, otherwise proceed to step 3.
+
+2. If types are different, we still can check whether constants could be
+losslessly bitcasted to each other. The further explanation is modification of
+``canLosslesslyBitCastTo`` method.
+
+   2.1 Check whether constants are of the first class types
+   (``isFirstClassType`` check):
+
+   2.1.1. If both constants are *not* of the first class type: return result
+   of ``cmpType``.
+
+   2.1.2. Otherwise, if left type is not of the first class, return -1. If
+   right type is not of the first class, return 1.
+
+   2.1.3. If both types are of the first class type, proceed to the next step
+   (2.1.3.1).
+
+   2.1.3.1. If types are vectors, compare their bitwidth using the
+   *cmpNumbers*. If result is not 0, return it.
+
+   2.1.3.2. Different types, but not a vectors:
+
+   * if both of them are pointers, good for us, we can proceed to step 3.
+   * if one of types is pointer, return result of *isPointer* flags
+     comparison (*cmpFlags* operation).
+   * otherwise we have no methods to prove bitcastability, and thus return
+     result of types comparison (-1 or 1).
+
+Steps below are for the case when types are equal, or case when constants are
+bitcastable:
+
+3. One of constants is a "*null*" value. Return the result of
+``cmpFlags(L->isNullValue, R->isNullValue)`` comparison.
+
+4. Compare value IDs, and return result if it is not 0:
+
+.. code-block:: c++
+
+  if (int Res = cmpNumbers(L->getValueID(), R->getValueID()))
+    return Res;
+
+5. Compare the contents of constants. The comparison depends on kind of
+constants, but on this stage it is just a lexicographical comparison. Just see
+how it was described in the beginning of "*Functions comparison*" paragraph.
+Mathematically it is equal to the next case: we encode left constant and right
+constant (with similar way *bitcode-writer* does). Then compare left code
+sequence and right code sequence.
+
+compare(const BasicBlock*, const BasicBlock*)
+---------------------------------------------
+Compares two *BasicBlock* instances.
+
+It enumerates instructions from left *BB* and right *BB*.
+
+1. It assigns serial numbers to the left and right instructions, using
+``cmpValues`` method.
+
+2. If one of left or right is *GEP* (``GetElementPtr``), then treat *GEP* as
+greater than other instructions, if both instructions are *GEPs* use ``cmpGEP``
+method for comparison. If result is -1 or 1, pass it to the top-level
+comparison (return it).
+
+   3.1. Compare operations. Call ``cmpOperation`` method. If result is -1 or
+   1, return it.
+
+   3.2. Compare number of operands, if result is -1 or 1, return it.
+
+   3.3. Compare operands themselves, use ``cmpValues`` method. Return result
+   if it is -1 or 1.
+
+   3.4. Compare type of operands, using ``cmpType`` method. Return result if
+   it is -1 or 1.
+
+   3.5. Proceed to the next instruction.
+
+4. We can finish instruction enumeration in 3 cases:
+
+   4.1. We reached the end of both left and right basic-blocks. We didn't
+   exit on steps 1-3, so contents is equal, return 0.
+
+   4.2. We have reached the end of the left basic-block. Return -1.
+
+   4.3. Return 1 (the end of the right basic block).
+
+cmpGEP
+------
+Compares two GEPs (``getelementptr`` instructions).
+
+It differs from regular operations comparison with the only thing: possibility
+to use ``accumulateConstantOffset`` method.
+
+So, if we get constant offset for both left and right *GEPs*, then compare it as
+numbers, and return comparison result.
+
+Otherwise treat it like a regular operation (see previous paragraph).
+
+cmpOperation
+------------
+Compares instruction opcodes and some important operation properties.
+
+1. Compare opcodes, if it differs return the result.
+
+2. Compare number of operands. If it differs – return the result.
+
+3. Compare operation types, use *cmpType*. All the same – if types are
+different, return result.
+
+4. Compare *subclassOptionalData*, get it with ``getRawSubclassOptionalData``
+method, and compare it like a numbers.
+
+5. Compare operand types.
+
+6. For some particular instructions check equivalence (relation in our case) of
+some significant attributes. For example we have to compare alignment for
+``load`` instructions.
+
+O(log(N))
+---------
+Methods described above implement order relationship. And latter, could be used
+for nodes comparison in a binary tree. So we can organize functions set into
+the binary tree and reduce the cost of lookup procedure from
+O(N*N) to O(log(N)).
+
+Merging process, mergeTwoFunctions
+==================================
+Once *MergeFunctions* detected that current function (*G*) is equal to one that
+were analyzed before (function *F*) it calls ``mergeTwoFunctions(Function*,
+Function*)``.
+
+Operation affects ``FnTree`` contents with next way: *F* will stay in
+``FnTree``. *G* being equal to *F* will not be added to ``FnTree``. Calls of
+*G* would be replaced with something else. It changes bodies of callers. So,
+functions that calls *G* would be put into ``Deferred`` set and removed from
+``FnTree``, and analyzed again.
+
+The approach is next:
+
+1. Most wished case: when we can use alias and both of *F* and *G* are weak. We
+make both of them with aliases to the third strong function *H*. Actually *H*
+is *F*. See below how it's made (but it's better to look straight into the
+source code). Well, this is a case when we can just replace *G* with *F*
+everywhere, we use ``replaceAllUsesWith`` operation here (*RAUW*).
+
+2. *F* could not be overridden, while *G* could. It would be good to do the
+next: after merging the places where overridable function were used, still use
+overridable stub. So try to make *G* alias to *F*, or create overridable tail
+call wrapper around *F* and replace *G* with that call.
+
+3. Neither *F* nor *G* could be overridden. We can't use *RAUW*. We can just
+change the callers: call *F* instead of *G*.  That's what
+``replaceDirectCallers`` does.
+
+Below is detailed body description.
+
+If “F” may be overridden
+------------------------
+As follows from ``mayBeOverridden`` comments: “whether the definition of this
+global may be replaced by something non-equivalent at link time”. If so, thats
+ok: we can use alias to *F* instead of *G* or change call instructions itself.
+
+HasGlobalAliases, removeUsers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+First consider the case when we have global aliases of one function name to
+another. Our purpose is  make both of them with aliases to the third strong
+function. Though if we keep *F* alive and without major changes we can leave it
+in ``FnTree``. Try to combine these two goals.
+
+Do stub replacement of *F* itself with an alias to *F*.
+
+1. Create stub function *H*, with the same name and attributes like function
+*F*. It takes maximum alignment of *F* and *G*.
+
+2. Replace all uses of function *F* with uses of function *H*. It is the two
+steps procedure instead. First of all, we must take into account, all functions
+from whom *F* is called would be changed: since we change the call argument
+(from *F* to *H*). If so we must to review these caller functions again after
+this procedure. We remove callers from ``FnTree``, method with name
+``removeUsers(F)`` does that (don't confuse with ``replaceAllUsesWith``):
+
+   2.1. ``Inside removeUsers(Value*
+   V)`` we go through the all values that use value *V* (or *F* in our context).
+   If value is instruction, we go to function that holds this instruction and
+   mark it as to-be-analyzed-again (put to ``Deferred`` set), we also remove
+   caller from ``FnTree``.
+
+   2.2. Now we can do the replacement: call ``F->replaceAllUsesWith(H)``.
+
+3. *H* (that now "officially" plays *F*'s role) is replaced with alias to *F*.
+Do the same with *G*: replace it with alias to *F*. So finally everywhere *F*
+was used, we use *H* and it is alias to *F*, and everywhere *G* was used we
+also have alias to *F*.
+
+4. Set *F* linkage to private. Make it strong :-)
+
+No global aliases, replaceDirectCallers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+If global aliases are not supported. We call ``replaceDirectCallers`` then. Just
+go through all calls of *G* and replace it with calls of *F*. If you look into
+method you will see that it scans all uses of *G* too, and if use is callee (if
+user is call instruction and *G* is used as what to be called), we replace it
+with use of *F*.
+
+If “F” could not be overridden, fix it!
+"""""""""""""""""""""""""""""""""""""""
+
+We call ``writeThunkOrAlias(Function *F, Function *G)``. Here we try to replace
+*G* with alias to *F* first. Next conditions are essential:
+
+* target should support global aliases,
+* the address itself of  *G* should be not significant, not named and not
+  referenced anywhere,
+* function should come with external, local or weak linkage.
+
+Otherwise we write thunk: some wrapper that has *G's* interface and calls *F*,
+so *G* could be replaced with this wrapper.
+
+*writeAlias*
+
+As follows from *llvm* reference:
+
+“Aliases act as *second name* for the aliasee value”. So we just want to create
+second name for *F* and use it instead of *G*:
+
+1. create global alias itself (*GA*),
+
+2. adjust alignment of *F* so it must be maximum of current and *G's* alignment;
+
+3. replace uses of *G*:
+
+   3.1. first mark all callers of *G* as to-be-analyzed-again, using
+   ``removeUsers`` method (see chapter above),
+
+   3.2. call ``G->replaceAllUsesWith(GA)``.
+
+4. Get rid of *G*.
+
+*writeThunk*
+
+As it written in method comments:
+
+“Replace G with a simple tail call to bitcast(F). Also replace direct uses of G
+with bitcast(F). Deletes G.”
+
+In general it does the same as usual when we want to replace callee, except the
+first point:
+
+1. We generate tail call wrapper around *F*, but with interface that allows use
+it instead of *G*.
+
+2. “As-usual”: ``removeUsers`` and ``replaceAllUsesWith`` then.
+
+3. Get rid of *G*.
+
+That's it.
+==========
+We have described how to detect equal functions, and how to merge them, and in
+first chapter we have described how it works all-together. Author hopes, reader
+have some picture from now, and it helps him improve and debug this pass.
+
+Reader is welcomed to send us any questions and proposals ;-)
diff --git a/docs/Passes.rst b/docs/Passes.rst
index 9f4009297aca..3f9534182c72 100644
--- a/docs/Passes.rst
+++ b/docs/Passes.rst
@@ -891,17 +891,24 @@ calls, or transforming sets of stores into ``memset``\ s.
 
 This pass looks for equivalent functions that are mergable and folds them.
 
-A hash is computed from the function, based on its type and number of basic
-blocks.
+Total-ordering is introduced among the functions set: we define comparison
+that answers for every two functions which of them is greater. It allows to
+arrange functions into the binary tree.
 
-Once all hashes are computed, we perform an expensive equality comparison on
-each function pair.  This takes n^2/2 comparisons per bucket, so it's important
-that the hash function be high quality.  The equality comparison iterates
-through each instruction in each basic block.
+For every new function we check for equivalent in tree.
 
-When a match is found the functions are folded.  If both functions are
-overridable, we move the functionality into a new internal function and leave
-two overridable thunks to it.
+If equivalent exists we fold such functions. If both functions are overridable,
+we move the functionality into a new internal function and leave two
+overridable thunks to it.
+
+If there is no equivalent, then we add this function to tree.
+
+Lookup routine has O(log(n)) complexity, while whole merging process has
+complexity of O(n*log(n)).
+
+Read
+:doc:`this <MergeFunctions>`
+article for more details.
 
 ``-mergereturn``: Unify function exit nodes
 -------------------------------------------
diff --git a/docs/Phabricator.rst b/docs/Phabricator.rst
index 8ac9afec6c39..3f4f72ab7539 100644
--- a/docs/Phabricator.rst
+++ b/docs/Phabricator.rst
@@ -66,7 +66,8 @@ To upload a new patch:
 * Leave the drop down on *Create a new Revision...* and click *Continue*.
 * Enter a descriptive title and summary; add reviewers and mailing
   lists that you want to be included in the review. If your patch is
-  for LLVM, cc llvm-commits; if your patch is for Clang, cc cfe-commits.
+  for LLVM, add llvm-commits as a subscriber; if your patch is for Clang,
+  add cfe-commits.
 * Click *Save*.
 
 To submit an updated patch:
diff --git a/docs/ProgrammersManual.rst b/docs/ProgrammersManual.rst
index 46ec15f93a32..85a4ad8e554a 100644
--- a/docs/ProgrammersManual.rst
+++ b/docs/ProgrammersManual.rst
@@ -422,9 +422,12 @@ to specify the debug type for the entire module (if you do this before you
 because there is no system in place to ensure that names do not conflict.  If
 two different modules use the same string, they will all be turned on when the
 name is specified.  This allows, for example, all debug information for
-instruction scheduling to be enabled with ``-debug-type=InstrSched``, even if
+instruction scheduling to be enabled with ``-debug-only=InstrSched``, even if
 the source lives in multiple files.
 
+For performance reasons, -debug-only is not available in optimized build
+(``--enable-optimized``) of LLVM.
+
 The ``DEBUG_WITH_TYPE`` macro is also available for situations where you would
 like to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement.  It
 takes an additional first parameter, which is the type to use.  For example, the
@@ -873,7 +876,7 @@ variety of customizations.
 llvm/ADT/ilist_node.h
 ^^^^^^^^^^^^^^^^^^^^^
 
-``ilist_node<T>`` implements a the forward and backward links that are expected
+``ilist_node<T>`` implements the forward and backward links that are expected
 by the ``ilist<T>`` (and analogous containers) in the default manner.
 
 ``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually
diff --git a/docs/R600Usage.rst b/docs/R600Usage.rst
new file mode 100644
index 000000000000..48a30c8a8dd6
--- /dev/null
+++ b/docs/R600Usage.rst
@@ -0,0 +1,43 @@
+============================
+User Guide for R600 Back-end
+============================
+
+Introduction
+============
+
+The R600 back-end provides ISA code generation for AMD GPUs, starting with
+the R600 family up until the current Sea Islands (GCN Gen 2).
+
+
+Assembler
+=========
+
+The assembler is currently a work in progress and not yet complete.  Below
+are the currently supported features.
+
+SOPP Instructions
+-----------------
+
+Unless otherwise mentioned, all SOPP instructions that with an operand
+accept a integer operand(s) only.  No verification is performed on the
+operands, so it is up to the programmer to be familiar with the range
+or acceptable values.
+
+s_waitcnt
+^^^^^^^^^
+
+s_waitcnt accepts named arguments to specify which memory counter(s) to
+wait for.
+
+.. code-block:: nasm
+
+   // Wait for all counters to be 0
+   s_waitcnt 0
+
+   // Equivalent to s_waitcnt 0.  Counter names can also be delimited by
+   // '&' or ','.
+   s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)
+
+   // Wait for vmcnt counter to be 1.
+   s_waitcnt vmcnt(1)
+
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst
index 87f96a64230f..dd54d45c99c9 100644
--- a/docs/ReleaseNotes.rst
+++ b/docs/ReleaseNotes.rst
@@ -1,16 +1,21 @@
 ======================
-LLVM 3.5 Release Notes
+LLVM 3.6 Release Notes
 ======================
 
 .. contents::
     :local:
 
+.. warning::
+   These are in-progress notes for the upcoming LLVM 3.6 release.  You may
+   prefer the `LLVM 3.5 Release Notes <http://llvm.org/releases/3.5.0/docs
+   /ReleaseNotes.html>`_.
+
 
 Introduction
 ============
 
 This document contains the release notes for the LLVM Compiler Infrastructure,
-release 3.5.  Here we describe the status of LLVM, including major improvements
+release 3.6.  Here we describe the status of LLVM, including major improvements
 from the previous release, improvements in various subprojects of LLVM, and
 some of the current users of the code.  All LLVM releases may be downloaded
 from the `LLVM releases web site <http://llvm.org/releases/>`_.
@@ -21,77 +26,14 @@ have questions or comments, the `LLVM Developer's Mailing List
 <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_ is a good place to send
 them.
 
+Note that if you are reading this file from a Subversion checkout or the main
+LLVM web page, this document applies to the *next* release, not the current
+one.  To see the release notes for a specific release, please see the `releases
+page <http://llvm.org/releases/>`_.
+
 Non-comprehensive list of changes in this release
 =================================================
 
-Changes to the MIPS Target
---------------------------
-
-* A large number of bugs have been fixed for big-endian Mips targets using the
-  N32 and N64 ABI's. Please note that some of these bugs will still affect
-  LLVM-IR generated by LLVM 3.5 since correct code generation depends on
-  appropriate usage of the ``inreg``, ``signext``, and ``zeroext`` attributes
-  on all function arguments and returns.
-
-* The registers used to return a structure containing a single 128-bit floating
-  point member on the N32/N64 ABI's have been changed from those specified by
-  the ABI documentation to match those used by GCC. The documentation specifies
-  that ``$f0`` and ``$f2`` should be used but GCC has used ``$f0`` and ``$f1``
-  for many years.
-
-* Returning a zero-byte struct no longer causes incorrect code generation when
-  using the O32 ABI.
-
-* Passing structures of less than 32-bits using the O32 ABI on a big-endian
-  target has been fixed.
-
-* The exception personality has been changed for 64-bit Mips targets to
-  eliminate warnings about relocations in a read-only section.
-
-* Incorrect usage of odd-numbered single-precision floating point registers
-  has been fixed when the fastcc calling convention is used with 64-bit FPU's
-  and -mno-odd-spreg.
-
-* For inline assembly, the 'z' print-modifier print modifier can now be used on
-  non-immediate values.
-
-* Attempting to disassemble l[wd]c[23], s[wd]c[23], cache, and pref no longer
-  triggers an assertion.
-
-Non-comprehensive list of changes in 3.5
-========================================
-
-* All backends have been changed to use the MC asm printer and support for the
-  non MC one has been removed.
-
-* Clang can now successfully self-host itself on Linux/Sparc64 and on
-  FreeBSD/Sparc64.
-
-* LLVM now assumes the assembler supports ``.loc`` for generating debug line
-  numbers. The old support for printing the debug line info directly was only
-  used by ``llc`` and has been removed.
-
-* All inline assembly is parsed by the integrated assembler when it is enabled.
-  Previously this was only the case for object-file output. It is now the case
-  for assembly output as well. The integrated assembler can be disabled with
-  the ``-no-integrated-as`` option.
-
-* llvm-ar now handles IR files like regular object files. In particular, a
-  regular symbol table is created for symbols defined in IR files, including
-  those in file scope inline assembly.
-
-* LLVM now always uses cfi directives for producing most stack
-  unwinding information.
-
-* The prefix for loop vectorizer hint metadata has been changed from
-  ``llvm.vectorizer`` to ``llvm.loop.vectorize``.  In addition,
-  ``llvm.vectorizer.unroll`` metadata has been renamed
-  ``llvm.loop.interleave.count``.
-
-* Some backends previously implemented Atomic NAND(x,y) as ``x & ~y``. Now 
-  all backends implement it as ``~(x & y)``, matching the semantics of GCC 4.4
-  and later.
-
 .. NOTE
    For small 1-3 sentence descriptions, just add an entry at the end of
    this list. If your description won't fit comfortably in one bullet
@@ -99,6 +41,11 @@ Non-comprehensive list of changes in 3.5
    functionality, or simply have a lot to talk about), see the `NOTE` below
    for adding a new subsection.
 
+* Support for AuroraUX has been removed.
+
+* Added support for a `native object file-based bitcode wrapper format
+  <BitCodeFormat.html#native-object-file>`_.
+
 * ... next change ...
 
 .. NOTE
@@ -111,276 +58,186 @@ Non-comprehensive list of changes in 3.5
 
    Makes programs 10x faster by doing Special New Thing.
 
+Prefix data rework
+------------------
+
+The semantics of the ``prefix`` attribute have been changed. Users
+that want the previous ``prefix`` semantics should instead use
+``prologue``.  To motivate this change, let's examine the primary
+usecases that these attributes aim to serve,
+
+  1. Code sanitization metadata (e.g. Clang's undefined behavior
+     sanitizer)
+
+  2. Function hot-patching: Enable the user to insert ``nop`` operations
+     at the beginning of the function which can later be safely replaced
+     with a call to some instrumentation facility.
+
+  3. Language runtime metadata: Allow a compiler to insert data for
+     use by the runtime during execution. GHC is one example of a
+     compiler that needs this functionality for its
+     tables-next-to-code functionality.
+
+Previously ``prefix`` served cases (1) and (2) quite well by allowing the user
+to introduce arbitrary data at the entrypoint but before the function
+body. Case (3), however, was poorly handled by this approach as it
+required that prefix data was valid executable code.
+
+In this release the concept of prefix data has been redefined to be
+data which occurs immediately before the function entrypoint (i.e. the
+symbol address). Since prefix data now occurs before the function
+entrypoint, there is no need for the data to be valid code.
+
+The previous notion of prefix data now goes under the name "prologue
+data" to emphasize its duality with the function epilogue.
+
+The intention here is to handle cases (1) and (2) with prologue data and
+case (3) with prefix data. See the language reference for further details
+on the semantics of these attributes.
+
+This refactoring arose out of discussions_ with Reid Kleckner in
+response to a proposal to introduce the notion of symbol offsets to
+enable handling of case (3).
+
+.. _discussions: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html
+
+
 Changes to the ARM Backend
 --------------------------
 
-Since release 3.3, a lot of new features have been included in the ARM
-back-end but weren't production ready (ie. well tested) on release 3.4.
-Just after the 3.4 release, we started heavily testing two major parts
-of the back-end: the integrated assembler (IAS) and the ARM exception
-handling (EHABI), and now they are enabled by default on LLVM/Clang.
-
-The IAS received a lot of GNU extensions and directives, as well as some
-specific pre-UAL instructions. Not all remaining directives will be
-implemented, as we made judgement calls on the need versus the complexity,
-and have chosen simplicity and future compatibility where hard decisions
-had to be made. The major difference is, as stated above, the IAS validates
-all inline ASM, not just for object emission, and that cause trouble with
-some uses of inline ASM as pre-processor magic.
-
-So, while the IAS is good enough to compile large projects (including most
-of the Linux kernel), there are a few things that we can't (and probably
-won't) do. For those cases, please use ``-fno-integrated-as`` in Clang.
-
-Exception handling is another big change. After extensive testing and
-changes to cooperate with Dwarf unwinding, EHABI is enabled by default.
-The options ``-arm-enable-ehabi`` and ``-arm-enable-ehabi-descriptors``,
-which were used to enable EHABI in the previous releases, are removed now.
-
-This means all ARM code will emit EH unwind tables, or CFI unwinding (for
-debug/profiling), or both. To avoid run-time inconsistencies, C code will
-also emit EH tables (in case they interoperate with C++ code), as is the
-case for other architectures (ex. x86_64).
+ During this release ...
+
 
 Changes to the MIPS Target
 --------------------------
 
-There has been a large amount of improvements to the MIPS target which can be
-broken down into subtarget, ABI, and Integrated Assembler changes.
+During this release the MIPS target has reached a few major milestones. The
+compiler has gained support for MIPS-II and MIPS-III; become ABI-compatible
+with GCC for big and little endian O32, N32, and N64; and is now able to
+compile the Linux kernel for 32-bit targets. Additionally, LLD now supports
+microMIPS for the O32 ABI on little endian targets.
+
+ABI
+^^^
 
-Subtargets
-^^^^^^^^^^
+A large number of bugs have been fixed for big-endian MIPS targets using the
+N32 and N64 ABI's as well as a small number of bugs affecting other ABI's.
+Please note that some of these bugs will still affect LLVM-IR generated by
+LLVM 3.5 since correct code generation depends on appropriate usage of the
+``inreg``, ``signext``, and ``zeroext`` attributes on all function arguments
+and returns.
 
-Added support for Release 6 of the MIPS32 and MIPS64 architecture (MIPS32r6
-and MIPS64r6). Release 6 makes a number of significant changes to the MIPS32
-and MIPS64 architectures. For example, FPU registers are always 64-bits wide,
-FPU NaN values conform to IEEE 754 (2008), and the unaligned memory instructions
-(such as lwl and lwr) have been replaced with a requirement for ordinary memory
-operations to support unaligned operations. Full details of MIPS32 and MIPS64
-Release 6 can be found on the `MIPS64 Architecture page at Imagination
-Technologies <http://www.imgtec.com/mips/architectures/mips64.asp>`_.
+There are far too many corrections to provide a complete list but here are a
+few notable ones:
 
-This release also adds experimental support for MIPS-IV, cnMIPS, and Cavium
-Octeon CPU's.
+* Big-endian N32 and N64 now interlinks successfully with GCC compiled code.
+  Previously this didn't work for the majority of cases.
 
-Support for the MIPS SIMD Architecture (MSA) has been improved to support MSA
-on MIPS64.
+* The registers used to return a structure containing a single 128-bit floating
+  point member on the N32/N64 ABI's have been changed from those specified by
+  the ABI documentation to match those used by GCC. The documentation specifies
+  that ``$f0`` and ``$f2`` should be used but GCC has used ``$f0`` and ``$f1``
+  for many years.
 
-Support for IEEE 754 (2008) NaN values has been added.
+* Returning a zero-byte struct no longer causes arguments to be read from the
+  wrong registers when using the O32 ABI.
 
-ABI and ABI extensions
-^^^^^^^^^^^^^^^^^^^^^^
+* The exception personality has been changed for 64-bit MIPS targets to
+  eliminate warnings about relocations in a read-only section.
 
-There has also been considerable ABI work since the 3.4 release. This release
-adds support for the N32 ABI, the O32-FPXX ABI Extension, the O32-FP64 ABI
-Extension, and the O32-FP64A ABI Extension.
+* Incorrect usage of odd-numbered single-precision floating point registers
+  has been fixed when the fastcc calling convention is used with 64-bit FPU's
+  and -mno-odd-spreg.
 
-The N32 ABI is an existing ABI that has now been implemented in LLVM. It is a
-64-bit ABI that is similar to N64 but retains 32-bit pointers. N64 remains the
-default 64-bit ABI in LLVM. This differs from GCC where N32 is the default
-64-bit ABI.
+LLVMLinux
+^^^^^^^^^
 
-The O32-FPXX ABI Extension is 100% compatible with the O32-ABI and the O32-FP64
-ABI Extension and may be linked with either but may not be linked with both of
-these simultaneously. It extends the O32 ABI to allow the same code to execute
-without modification on processors with 32-bit FPU registers as well as 64-bit
-FPU registers. The O32-FPXX ABI Extension is enabled by default for the O32 ABI
-on mips*-img-linux-gnu and mips*-mti-linux-gnu triples and is selected with
--mfpxx. It is expected that future releases of LLVM will enable the FPXX
-Extension for O32 on all triples.
+It is now possible to compile the Linux kernel. This currently requires a small
+number of kernel patches. See the `LLVMLinux project
+<http://llvm.linuxfoundation.org/index.php/Main_Page>`_ for details.
 
-The O32-FP64 ABI Extension is an extension to the O32 ABI to fully exploit FPU's
-with 64-bit registers and is enabled with -mfp64. This replaces an undocumented
-and unsupported O32 extension which was previously enabled with -mfp64. It is
-100% compatible with the O32-FPXX ABI Extension.
+* Added -mabicalls and -mno-abicalls. The implementation may not be complete
+  but works sufficiently well for the Linux kernel.
 
-The O32-FP64A ABI Extension is a restricted form of the O32-FP64 ABI Extension
-which allows interlinking with unmodified binaries that use the base O32 ABI.
+* Fixed multiple compatibility issues between LLVM's inline assembly support
+  and GCC's.
 
-Integrated Assembler
-^^^^^^^^^^^^^^^^^^^^
+* Added support for a number of directives used by Linux to the Integrated
+  Assembler.
 
-The MIPS Integrated Assembler has undergone a substantial overhaul including a
-rewrite of the assembly parser. It's not ready for general use in this release
-but adventurous users may wish to enable it using ``-fintegrated-as``.
+Miscellaneous
+^^^^^^^^^^^^^
 
-In this release, the integrated assembler supports the majority of MIPS-I,
-MIPS-II, MIPS-III, MIPS-IV, MIPS-V, MIPS32, MIPS32r2, MIPS32r6, MIPS64,
-MIPS64r2, and MIPS64r6 as well as some of the Application Specific Extensions
-such as MSA. It also supports several of the MIPS specific assembler directives
-such as ``.set``, ``.module``, ``.cpload``, etc.
+* Attempting to disassemble l[wd]c[23], s[wd]c[23], cache, and pref no longer
+  triggers an assertion.
 
-Changes to the AArch64 Target
------------------------------
+* Added -muclibc and -mglibc to support toolchains that provide both uClibC and
+  GLibC.
 
-The AArch64 target in LLVM 3.5 is based on substantially different code to the
-one in LLVM 3.4, having been created as the result of merging code released by
-Apple for targetting iOS with the previously existing backend.
+* __SIZEOF_INT128__ is no longer defined for 64-bit targets since 128-bit
+  integers do not work at this time for this target.
 
-We hope the result is a general improvement in the project. Particularly notable
-changes are:
+* Using $t4-$t7 with the N32 and N64 ABI is deprecated when ``-fintegrated-as``
+  is in use and will be removed in LLVM 3.7. These names have never been
+  supported by the GNU Assembler for these ABI's.
 
-* We should produce faster code, having combined optimisations and ideas from
-  both sources in the final backend.
-* We have a FastISel for AArch64, which should compile time for debug builds (at
-  -O0).
-* We can now target iOS platforms (using the triple ``arm64-apple-ios7.0``).
+Changes to the PowerPC Target
+-----------------------------
 
-Background
-^^^^^^^^^^
+There are numerous improvements to the PowerPC target in this release:
 
-During the 3.5 release cycle, Apple released the source used to generate 64-bit
-ARM programs on iOS platforms. This took the form of a separate backend that had
-been developed in parallel to, and largely isolation from, the existing
-code.
+* LLVM now generates the Vector-Scalar eXtension (VSX) instructions from
+  version 2.06 of the Power ISA, for both big- and little-endian targets.
 
-We decided that maintaining the two backends indefinitely was not an option,
-since their features almost entirely overlapped. However, the implementation
-details in both were different enough that any merge had to firmly start with
-one backend as the core and cherry-pick the best features and optimisations from
-the other.
+* LLVM now has a POWER8 instruction scheduling description.
 
-After discussion, we decided to start with the Apple backend (called ARM64 at
-the time) since it was older, more thoroughly tested in production use, and had
-fewer idiosyncracies in the implementation details.
+* Address Sanitizer (ASAN) support is now fully functional.
 
-Many people from across the community worked throughout April and May to ensure
-that this merge destination had all the features we wanted, from both
-sources. In many cases we could simply copy code across; others needed heavy
-modification for the new host; in the most worthwhile, we looked at both
-implementations and combined the best features of each in an entirely new way.
+* Performance of simple atomic accesses has been greatly improved.
 
-We had also decided that the name of the combined backend should be AArch64,
-following ARM's official documentation. So, at the end of May the old
-AArch64 directory was removed, and ARM64 renamed into its place.
+* Atomic fences now use light-weight syncs where possible, again providing
+  significant performance benefit.
 
-Changes to the PowerPC Target
------------------------------
+* The PowerPC target now supports PIC levels (-fPIC vs. -fpic).
 
-The PowerPC 64-bit Little Endian subtarget (powerpc64le-unknown-linux-gnu) is
-now fully supported.  This includes support for the Altivec instruction set.
+* PPC32 SVR4 now supports small-model PIC.
 
-The Power Architecture 64-Bit ELFv2 ABI Specification is now supported, and
-is the default ABI for Little Endian.  The ELFv1 ABI remains the default ABI
-for Big Endian.  Currently, it is not possible to override these defaults.
-That capability will be available (albeit not recommended) in a future release.
+* There have been many smaller bug fixes and performance improvements.
 
-Links to the ELFv2 ABI specification and to the Power ISA Version 2.07
-specification may be found `here <https://www-03.ibm.com/technologyconnect/tgcm/TGCMServlet.wss?alias=OpenPOWER&linkid=1n0000>`_ (free registration required).
-Efforts are underway to move this to a location that doesn't require
-registration, but the planned site isn't ready yet.
+Changes to the OCaml bindings
+-----------------------------
 
-Experimental support for the VSX instruction set introduced with ISA 2.06
-is now available using the ``-mvsx`` switch.  Work remains on this, so it
-is not recommended for production use.  VSX is disabled for Little Endian
-regardless of this switch setting.
+* The bindings now require OCaml >=4.00.0, ocamlfind,
+  ctypes >=0.3.0 <0.4 and OUnit 2 if tests are enabled.
 
-Load/store cost estimates have been improved.
+* The bindings can now be built using cmake as well as autoconf.
 
-Constant hoisting has been enabled.
+* LLVM 3.5 has, unfortunately, shipped a broken Llvm_executionengine
+  implementation. In LLVM 3.6, the bindings now fully support MCJIT,
+  however the interface is reworked from scratch using ctypes
+  and is not backwards compatible.
 
-Global named register support has been enabled.
+* Llvm_linker.Mode was removed following the changes in LLVM.
+  This breaks the interface of Llvm_linker.
 
-Initial support for PIC code has been added for the 32-bit ELF subtarget.
-Further support will be available in a future release.
+* All combinations of ocamlc/ocamlc -custom/ocamlopt and shared/static
+  builds of LLVM are now supported.
 
+* Absolute paths are not embedded into the OCaml libraries anymore.
+  Either OCaml >=4.02.2 must be used, which includes an rpath-like $ORIGIN
+  mechanism, or META file must be updated for out-of-tree installations;
+  see r221139.
 
-Changes to CMake build system
------------------------------
+* As usual, many more functions have been exposed to OCaml.
 
-* Building and installing LLVM, Clang and lld sphinx documentation can now be
-  done in CMake builds. If ``LLVM_ENABLE_SPHINX`` is enabled the
-  "``docs-<project>-html``" and "``docs-<project>-man``" targets (e.g.
-  ``docs-llvm-html``) become available which can be invoked directly (e.g.
-  ``make docs-llvm-html``) to build only the relevant sphinx documentation. If
-  ``LLVM_BUILD_DOCS`` is enabled then the sphinx documentation will also be
-  built as part of the normal build. Enabling this variable also means that if
-  the ``install`` target is invoked then the built documentation will be
-  installed.  See :ref:`LLVM-specific variables`.
-
-* Both the Autoconf/Makefile and CMake build systems now generate
-  ``LLVMConfig.cmake`` (and other files) to export installed libraries. This
-  means that projects using CMake to build against LLVM libraries can now build
-  against an installed LLVM built by the Autoconf/Makefile system. See
-  :ref:`Embedding LLVM in your project` for details.
-
-* Use of ``llvm_map_components_to_libraries()`` by external projects is
-  deprecated and the new ``llvm_map_components_to_libnames()`` should be used
-  instead.
-
-External Open Source Projects Using LLVM 3.5
+External Open Source Projects Using LLVM 3.6
 ============================================
 
 An exciting aspect of LLVM is that it is used as an enabling technology for
 a lot of other language and tools projects. This section lists some of the
-projects that have already been updated to work with LLVM 3.5.
-
-LDC - the LLVM-based D compiler
--------------------------------
-
-`D <http://dlang.org>`_ is a language with C-like syntax and static typing. It
-pragmatically combines efficiency, control, and modeling power, with safety and
-programmer productivity. D supports powerful concepts like Compile-Time Function
-Execution (CTFE) and Template Meta-Programming, provides an innovative approach
-to concurrency and offers many classical paradigms.
-
-`LDC <http://wiki.dlang.org/LDC>`_ uses the frontend from the reference compiler
-combined with LLVM as backend to produce efficient native code. LDC targets
-x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux/PPC64.
-Ports to other architectures like ARM, AArch64 and MIPS64 are underway.
-
-Portable Computing Language (pocl)
-----------------------------------
-
-In addition to producing an easily portable open source OpenCL
-implementation, another major goal of `pocl <http://portablecl.org/>`_
-is improving performance portability of OpenCL programs with
-compiler optimizations, reducing the need for target-dependent manual
-optimizations. An important part of pocl is a set of LLVM passes used to
-statically parallelize multiple work-items with the kernel compiler, even in
-the presence of work-group barriers. This enables static parallelization of
-the fine-grained static concurrency in the work groups in multiple ways. 
-
-TTA-based Co-design Environment (TCE)
--------------------------------------
-
-`TCE <http://tce.cs.tut.fi/>`_ is a toolset for designing new
-exposed datapath processors based on the Transport triggered architecture (TTA). 
-The toolset provides a complete co-design flow from C/C++
-programs down to synthesizable VHDL/Verilog and parallel program binaries.
-Processor customization points include the register files, function units,
-supported operations, and the interconnection network.
-
-TCE uses Clang and LLVM for C/C++/OpenCL C language support, target independent 
-optimizations and also for parts of code generation. It generates
-new LLVM-based code generators "on the fly" for the designed processors and
-loads them in to the compiler backend as runtime libraries to avoid
-per-target recompilation of larger parts of the compiler chain. 
-
-ISPC
-----
-
-`ISPC <http://ispc.github.io/>`_ is a C-based language based on the SPMD
-(single program, multiple data) programming model that generates efficient
-SIMD code for modern processors without the need for complex analysis and
-autovectorization. The language exploits the concept of “varying” data types,
-which ensure vector-friendly data layout, explicit vectorization and compact
-representation of the program. The project uses the LLVM infrastructure for
-optimization and code generation.
-
-Likely
-------
-
-`Likely <http://www.liblikely.org>`_ is an embeddable just-in-time Lisp for
-image recognition and heterogenous architectures. Algorithms are just-in-time
-compiled using LLVM’s MCJIT infrastructure to execute on single or
-multi-threaded CPUs and potentially OpenCL SPIR or CUDA enabled GPUs. Likely
-exploits the observation that while image processing and statistical learning
-kernels must be written generically to handle any matrix datatype, at runtime
-they tend to be executed repeatedly on the same type. Likely also seeks to
-explore new optimizations for statistical learning algorithms by moving them
-from an offline model generation step to a compile-time simplification of a
-function (the learning algorithm) with constant arguments (the training set).
+projects that have already been updated to work with LLVM 3.6.
+
+* A project
 
 
 Additional Information
diff --git a/docs/SourceLevelDebugging.rst b/docs/SourceLevelDebugging.rst
index 869d3a383107..3a5fa6ef24be 100644
--- a/docs/SourceLevelDebugging.rst
+++ b/docs/SourceLevelDebugging.rst
@@ -186,11 +186,15 @@ the simple data types ``i32``, ``i1``, ``float``, ``double``, ``mdstring`` and
     ...
   }
 
-<a name="LLVMDebugVersion">The first field of a descriptor is always an
-``i32`` containing a tag value identifying the content of the descriptor.
-The remaining fields are specific to the descriptor.  The values of tags are
-loosely bound to the tag values of DWARF information entries.  However, that
-does not restrict the use of the information supplied to DWARF targets.
+Most of the string and integer fields in descriptors are packed into a single,
+null-separated ``mdstring``.  The first field of the header is always an
+``i32`` containing the DWARF tag value identifying the content of the
+descriptor.
+
+For clarity of definition in this document, these header fields are described
+below split inside an imaginary ``DIHeader`` construct.  This is invalid
+assembly syntax.  In valid IR, these fields are stringified and concatenated,
+separated by ``\00``.
 
 The details of the various descriptors follow.
 
@@ -200,19 +204,22 @@ Compile unit descriptors
 .. code-block:: llvm
 
   !0 = metadata !{
-    i32,       ;; Tag = 17 (DW_TAG_compile_unit)
+    DIHeader(
+      i32,       ;; Tag = 17 (DW_TAG_compile_unit)
+      i32,       ;; DWARF language identifier (ex. DW_LANG_C89)
+      mdstring,  ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
+      i1,        ;; True if this is optimized.
+      mdstring,  ;; Flags
+      i32,       ;; Runtime version
+      mdstring,  ;; Split debug filename
+      i32        ;; Debug info emission kind (1 = Full Debug Info, 2 = Line Tables Only)
+    ),
     metadata,  ;; Source directory (including trailing slash) & file pair
-    i32,       ;; DWARF language identifier (ex. DW_LANG_C89)
-    metadata   ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
-    i1,        ;; True if this is optimized.
-    metadata,  ;; Flags
-    i32        ;; Runtime version
-    metadata   ;; List of enums types
-    metadata   ;; List of retained types
-    metadata   ;; List of subprograms
-    metadata   ;; List of global variables
+    metadata,  ;; List of enums types
+    metadata,  ;; List of retained types
+    metadata,  ;; List of subprograms
+    metadata,  ;; List of global variables
     metadata   ;; List of imported entities
-    metadata   ;; Split debug filename
   }
 
 These descriptors contain a source language ID for the file (we use the DWARF
@@ -235,8 +242,10 @@ File descriptors
 .. code-block:: llvm
 
   !0 = metadata !{
-    i32,      ;; Tag = 41 (DW_TAG_file_type)
-    metadata, ;; Source directory (including trailing slash) & file pair
+    DIHeader(
+      i32       ;; Tag = 41 (DW_TAG_file_type)
+    ),
+    metadata  ;; Source directory (including trailing slash) & file pair
   }
 
 These descriptors contain information for a file.  Global variables and top
@@ -254,17 +263,18 @@ Global variable descriptors
 .. code-block:: llvm
 
   !1 = metadata !{
-    i32,      ;; Tag = 52 (DW_TAG_variable)
-    i32,      ;; Unused field.
+    DIHeader(
+      i32,      ;; Tag = 52 (DW_TAG_variable)
+      mdstring, ;; Name
+      mdstring, ;; Display name (fully qualified C++ name)
+      mdstring, ;; MIPS linkage name (for C++)
+      i32,      ;; Line number where defined
+      i1,       ;; True if the global is local to compile unit (static)
+      i1        ;; True if the global is defined in the compile unit (not extern)
+    ),
     metadata, ;; Reference to context descriptor
-    metadata, ;; Name
-    metadata, ;; Display name (fully qualified C++ name)
-    metadata, ;; MIPS linkage name (for C++)
     metadata, ;; Reference to file where defined
-    i32,      ;; Line number where defined
     metadata, ;; Reference to type descriptor
-    i1,       ;; True if the global is local to compile unit (static)
-    i1,       ;; True if the global is defined in the compile unit (not extern)
     {}*,      ;; Reference to the global variable
     metadata, ;; The static member declaration, if any
   }
@@ -281,27 +291,29 @@ Subprogram descriptors
 .. code-block:: llvm
 
   !2 = metadata !{
-    i32,      ;; Tag = 46 (DW_TAG_subprogram)
+    DIHeader(
+      i32,      ;; Tag = 46 (DW_TAG_subprogram)
+      mdstring, ;; Name
+      mdstring, ;; Display name (fully qualified C++ name)
+      mdstring, ;; MIPS linkage name (for C++)
+      i32,      ;; Line number where defined
+      i1,       ;; True if the global is local to compile unit (static)
+      i1,       ;; True if the global is defined in the compile unit (not extern)
+      i32,      ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
+      i32,      ;; Index into a virtual function
+      i32,      ;; Flags - Artificial, Private, Protected, Explicit, Prototyped.
+      i1,       ;; isOptimized
+      i32       ;; Line number where the scope of the subprogram begins
+    ),
     metadata, ;; Source directory (including trailing slash) & file pair
     metadata, ;; Reference to context descriptor
-    metadata, ;; Name
-    metadata, ;; Display name (fully qualified C++ name)
-    metadata, ;; MIPS linkage name (for C++)
-    i32,      ;; Line number where defined
     metadata, ;; Reference to type descriptor
-    i1,       ;; True if the global is local to compile unit (static)
-    i1,       ;; True if the global is defined in the compile unit (not extern)
-    i32,      ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
-    i32,      ;; Index into a virtual function
     metadata, ;; indicates which base type contains the vtable pointer for the
               ;; derived class
-    i32,      ;; Flags - Artificial, Private, Protected, Explicit, Prototyped.
-    i1,       ;; isOptimized
     {}*,      ;; Reference to the LLVM function
     metadata, ;; Lists function template parameters
     metadata, ;; Function declaration descriptor
-    metadata, ;; List of function variables
-    i32       ;; Line number where the scope of the subprogram begins
+    metadata  ;; List of function variables
   }
 
 These descriptors provide debug information about functions, methods and
@@ -314,13 +326,14 @@ Block descriptors
 .. code-block:: llvm
 
   !3 = metadata !{
-    i32,      ;; Tag = 11 (DW_TAG_lexical_block)
+    DIHeader(
+      i32,      ;; Tag = 11 (DW_TAG_lexical_block)
+      i32,      ;; Line number
+      i32,      ;; Column number
+      i32       ;; Unique ID to identify blocks from a template function
+    ),
     metadata, ;; Source directory (including trailing slash) & file pair
-    metadata, ;; Reference to context descriptor
-    i32,      ;; Line number
-    i32,      ;; Column number
-    i32,      ;; DWARF path discriminator value
-    i32       ;; Unique ID to identify blocks from a template function
+    metadata  ;; Reference to context descriptor
   }
 
 This descriptor provides debug information about nested blocks within a
@@ -330,7 +343,10 @@ lexical blocks at same depth.
 .. code-block:: llvm
 
   !3 = metadata !{
-    i32,      ;; Tag = 11 (DW_TAG_lexical_block)
+    DIHeader(
+      i32,      ;; Tag = 11 (DW_TAG_lexical_block)
+      i32       ;; DWARF path discriminator value
+    ),
     metadata, ;; Source directory (including trailing slash) & file pair
     metadata  ;; Reference to the scope we're annotating with a file change
   }
@@ -346,16 +362,18 @@ Basic type descriptors
 .. code-block:: llvm
 
   !4 = metadata !{
-    i32,      ;; Tag = 36 (DW_TAG_base_type)
+    DIHeader(
+      i32,      ;; Tag = 36 (DW_TAG_base_type)
+      mdstring, ;; Name (may be "" for anonymous types)
+      i32,      ;; Line number where defined (may be 0)
+      i64,      ;; Size in bits
+      i64,      ;; Alignment in bits
+      i64,      ;; Offset in bits
+      i32,      ;; Flags
+      i32       ;; DWARF type encoding
+    ),
     metadata, ;; Source directory (including trailing slash) & file pair (may be null)
-    metadata, ;; Reference to context
-    metadata, ;; Name (may be "" for anonymous types)
-    i32,      ;; Line number where defined (may be 0)
-    i64,      ;; Size in bits
-    i64,      ;; Alignment in bits
-    i64,      ;; Offset in bits
-    i32,      ;; Flags
-    i32       ;; DWARF type encoding
+    metadata  ;; Reference to context
   }
 
 These descriptors define primitive types used in the code.  Example ``int``,
@@ -389,22 +407,19 @@ Derived type descriptors
 .. code-block:: llvm
 
   !5 = metadata !{
-    i32,      ;; Tag (see below)
+    DIHeader(
+      i32,      ;; Tag (see below)
+      mdstring, ;; Name (may be "" for anonymous types)
+      i32,      ;; Line number where defined (may be 0)
+      i64,      ;; Size in bits
+      i64,      ;; Alignment in bits
+      i64,      ;; Offset in bits
+      i32       ;; Flags to encode attributes, e.g. private
+    ),
     metadata, ;; Source directory (including trailing slash) & file pair (may be null)
     metadata, ;; Reference to context
-    metadata, ;; Name (may be "" for anonymous types)
-    i32,      ;; Line number where defined (may be 0)
-    i64,      ;; Size in bits
-    i64,      ;; Alignment in bits
-    i64,      ;; Offset in bits
-    i32,      ;; Flags to encode attributes, e.g. private
     metadata, ;; Reference to type derived from
-    metadata, ;; (optional) Name of the Objective C property associated with
-              ;; Objective-C an ivar, or the type of which this
-              ;; pointer-to-member is pointing to members of.
-    metadata, ;; (optional) Name of the Objective C property getter selector.
-    metadata, ;; (optional) Name of the Objective C property setter selector.
-    i32       ;; (optional) Objective C property attributes.
+    metadata  ;; (optional) Objective C property node
   }
 
 These descriptors are used to define types derived from other types.  The value
@@ -452,21 +467,23 @@ Composite type descriptors
 .. code-block:: llvm
 
   !6 = metadata !{
-    i32,      ;; Tag (see below)
+    DIHeader(
+      i32,      ;; Tag (see below)
+      mdstring, ;; Name (may be "" for anonymous types)
+      i32,      ;; Line number where defined (may be 0)
+      i64,      ;; Size in bits
+      i64,      ;; Alignment in bits
+      i64,      ;; Offset in bits
+      i32,      ;; Flags
+      i32       ;; Runtime languages
+    ),
     metadata, ;; Source directory (including trailing slash) & file pair (may be null)
     metadata, ;; Reference to context
-    metadata, ;; Name (may be "" for anonymous types)
-    i32,      ;; Line number where defined (may be 0)
-    i64,      ;; Size in bits
-    i64,      ;; Alignment in bits
-    i64,      ;; Offset in bits
-    i32,      ;; Flags
     metadata, ;; Reference to type derived from
     metadata, ;; Reference to array of member descriptors
-    i32,      ;; Runtime languages
     metadata, ;; Base type containing the vtable pointer for this type
     metadata, ;; Template parameters
-    metadata  ;; A unique identifier for type uniquing purpose (may be null)
+    mdstring  ;; A unique identifier for type uniquing purpose (may be null)
   }
 
 These descriptors are used to define types that are composed of 0 or more
@@ -528,9 +545,11 @@ Subrange descriptors
 .. code-block:: llvm
 
   !42 = metadata !{
-    i32,      ;; Tag = 33 (DW_TAG_subrange_type)
-    i64,      ;; Low value
-    i64       ;; High value
+    DIHeader(
+      i32,      ;; Tag = 33 (DW_TAG_subrange_type)
+      i64,      ;; Low value
+      i64       ;; High value
+    )
   }
 
 These descriptors are used to define ranges of array subscripts for an array
@@ -547,9 +566,11 @@ Enumerator descriptors
 .. code-block:: llvm
 
   !6 = metadata !{
-    i32,      ;; Tag = 40 (DW_TAG_enumerator)
-    metadata, ;; Name
-    i64       ;; Value
+    DIHeader(
+      i32,      ;; Tag = 40 (DW_TAG_enumerator)
+      mdstring, ;; Name
+      i64       ;; Value
+    )
   }
 
 These descriptors are used to define members of an enumeration :ref:`composite
@@ -561,16 +582,17 @@ Local variables
 .. code-block:: llvm
 
   !7 = metadata !{
-    i32,      ;; Tag (see below)
+    DIHeader(
+      i32,      ;; Tag (see below)
+      mdstring, ;; Name
+      i32,      ;; 24 bit - Line number where defined
+                ;; 8 bit - Argument number. 1 indicates 1st argument.
+      i32       ;; flags
+    ),
     metadata, ;; Context
-    metadata, ;; Name
     metadata, ;; Reference to file where defined
-    i32,      ;; 24 bit - Line number where defined
-              ;; 8 bit - Argument number. 1 indicates 1st argument.
     metadata, ;; Reference to the type descriptor
-    i32,      ;; flags
     metadata  ;; (optional) Reference to inline location
-    metadata  ;; (optional) Reference to a complex expression (see below)
   }
 
 These descriptors are used to define variables local to a sub program.  The
@@ -589,6 +611,25 @@ The context is either the subprogram or block where the variable is defined.
 Name the source variable name.  Context and line indicate where the variable
 was defined.  Type descriptor defines the declared type of the variable.
 
+Complex Expressions
+^^^^^^^^^^^^^^^^^^^
+.. code-block:: llvm
+
+  !8 = metadata !{
+    i32,      ;; DW_TAG_expression
+    ...
+  }
+
+Complex expressions describe variable storage locations in terms of
+prefix-notated DWARF expressions. Currently the only supported
+operators are ``DW_OP_plus``, ``DW_OP_deref``, and ``DW_OP_piece``.
+
+The ``DW_OP_piece`` operator is used for (typically larger aggregate)
+variables that are fragmented across several locations. It takes two
+i32 arguments, an offset and a size in bytes to describe which piece
+of the variable is at this location.
+
+
 .. _format_common_intrinsics:
 
 Debugger intrinsic functions
@@ -726,8 +767,7 @@ Compiled to LLVM, this function would be represented like this:
   !15 = metadata !{i32 786688, metadata !16, metadata !"Z", metadata !5, i32 5,
                    metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [Z] \
                      [line 5]
-  !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0,
-                   i32 0} \
+  !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0} \
                    ; [ DW_TAG_lexical_block ] [/private/tmp/t.c]
   !17 = metadata !{i32 5, i32 0, metadata !16, null}
   !18 = metadata !{i32 6, i32 0, metadata !16, null}
@@ -779,8 +819,7 @@ scope information for the variable ``Z``.
 
 .. code-block:: llvm
 
-  !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0,
-                   i32 0}
+  !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0} \
                    ; [ DW_TAG_lexical_block ] [/private/tmp/t.c]
   !17 = metadata !{i32 5, i32 0, metadata !16, null}
 
@@ -810,73 +849,15 @@ to provide completely different forms if they don't fit into the DWARF model.
 As support for debugging information gets added to the various LLVM
 source-language front-ends, the information used should be documented here.
 
-The following sections provide examples of various C/C++ constructs and the
-debug information that would best describe those constructs.
+The following sections provide examples of a few C/C++ constructs and the debug
+information that would best describe those constructs.  The canonical
+references are the ``DIDescriptor`` classes defined in
+``include/llvm/IR/DebugInfo.h`` and the implementations of the helper functions
+in ``lib/IR/DIBuilder.cpp``.
 
 C/C++ source file information
 -----------------------------
 
-Given the source files ``MySource.cpp`` and ``MyHeader.h`` located in the
-directory ``/Users/mine/sources``, the following code:
-
-.. code-block:: c
-
-  #include "MyHeader.h"
-
-  int main(int argc, char *argv[]) {
-    return 0;
-  }
-
-a C/C++ front-end would generate the following descriptors:
-
-.. code-block:: llvm
-
-  ...
-  ;;
-  ;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp".
-  ;;
-  !0 = metadata !{
-    i32 786449,   ;; Tag
-    metadata !1,  ;; File/directory name
-    i32 4,        ;; Language Id
-    metadata !"clang version 3.4 ",
-    i1 false,     ;; Optimized compile unit
-    metadata !"", ;; Compiler flags
-    i32 0,        ;; Runtime version
-    metadata !2,  ;; Enumeration types
-    metadata !2,  ;; Retained types
-    metadata !3,  ;; Subprograms
-    metadata !2,  ;; Global variables
-    metadata !2,  ;; Imported entities (declarations and namespaces)
-    metadata !""  ;; Split debug filename
-  }
-
-  ;;
-  ;; Define the file for the file "/Users/mine/sources/MySource.cpp".
-  ;;
-  !1 = metadata !{
-    metadata !"MySource.cpp",
-    metadata !"/Users/mine/sources"
-  }
-  !5 = metadata !{
-    i32 786473, ;; Tag
-    metadata !1
-  }
-
-  ;;
-  ;; Define the file for the file "/Users/mine/sources/Myheader.h"
-  ;;
-  !14 = metadata !{
-    i32 786473, ;; Tag
-    metadata !15
-  }
-  !15 = metadata !{
-    metadata !"./MyHeader.h",
-    metadata !"/Users/mine/sources",
-  }
-
-  ...
-
 ``llvm::Instruction`` provides easy access to metadata attached with an
 instruction.  One can extract line number information encoded in LLVM IR using
 ``Instruction::getMetadata()`` and ``DILocation::getLineNumber()``.
@@ -906,7 +887,7 @@ a C/C++ front-end would generate the following descriptors:
   ;;
   ;; Define the global itself.
   ;;
-  %MyGlobal = global int 100
+  @MyGlobal = global i32 100, align 4
   ...
   ;;
   ;; List of debug info of globals
@@ -915,24 +896,35 @@ a C/C++ front-end would generate the following descriptors:
 
   ;; Define the compile unit.
   !0 = metadata !{
-    i32 786449,                       ;; Tag
-    i32 0,                            ;; Context
-    i32 4,                            ;; Language
-    metadata !"foo.cpp",              ;; File
-    metadata !"/Volumes/Data/tmp",    ;; Directory
-    metadata !"clang version 3.1 ",   ;; Producer
-    i1 true,                          ;; Deprecated field
-    i1 false,                         ;; "isOptimized"?
-    metadata !"",                     ;; Flags
-    i32 0,                            ;; Runtime Version
-    metadata !1,                      ;; Enum Types
-    metadata !1,                      ;; Retained Types
-    metadata !1,                      ;; Subprograms
-    metadata !3,                      ;; Global Variables
-    metadata !1,                      ;; Imported entities
-    "",                               ;; Split debug filename
+    ; Header(
+    ;   i32 17,                           ;; Tag
+    ;   i32 0,                            ;; Context
+    ;   i32 4,                            ;; Language
+    ;   metadata !"clang version 3.6.0 ", ;; Producer
+    ;   i1 false,                         ;; "isOptimized"?
+    ;   metadata !"",                     ;; Flags
+    ;   i32 0,                            ;; Runtime Version
+    ;   "",                               ;; Split debug filename
+    ;   1                                 ;; Full debug info
+    ; )
+    metadata !"0x11\0012\00clang version 3.6.0 \000\00\000\00\001",
+    metadata !1,                          ;; File
+    metadata !2,                          ;; Enum Types
+    metadata !2,                          ;; Retained Types
+    metadata !2,                          ;; Subprograms
+    metadata !3,                          ;; Global Variables
+    metadata !2                           ;; Imported entities
   } ; [ DW_TAG_compile_unit ]
 
+  ;; The file/directory pair.
+  !1 = metadata !{
+    metadata !"foo.c",                                 ;; Filename
+    metadata !"/Users/dexonsmith/data/llvm/debug-info" ;; Directory
+  }
+
+  ;; An empty array.
+  !2 = metadata !{}
+
   ;; The Array of Global Variables
   !3 = metadata !{
     metadata !4
@@ -942,17 +934,19 @@ a C/C++ front-end would generate the following descriptors:
   ;; Define the global variable itself.
   ;;
   !4 = metadata !{
-    i32 786484,                        ;; Tag
-    i32 0,                             ;; Unused
+    ; Header(
+    ;   i32 52,                        ;; Tag
+    ;   metadata !"MyGlobal",          ;; Name
+    ;   metadata !"MyGlobal",          ;; Display Name
+    ;   metadata !"",                  ;; Linkage Name
+    ;   i32 1,                         ;; Line
+    ;   i32 0,                         ;; IsLocalToUnit
+    ;   i32 1                          ;; IsDefinition
+    ; )
+    metadata !"0x34\00MyGlobal\00MyGlobal\00\001\000\001",
     null,                              ;; Unused
-    metadata !"MyGlobal",              ;; Name
-    metadata !"MyGlobal",              ;; Display Name
-    metadata !"",                      ;; Linkage Name
-    metadata !6,                       ;; File
-    i32 1,                             ;; Line
-    metadata !7,                       ;; Type
-    i32 0,                             ;; IsLocalToUnit
-    i32 1,                             ;; IsDefinition
+    metadata !5,                       ;; File
+    metadata !6,                       ;; Type
     i32* @MyGlobal,                    ;; LLVM-IR Value
     null                               ;; Static member declaration
   } ; [ DW_TAG_variable ]
@@ -961,28 +955,30 @@ a C/C++ front-end would generate the following descriptors:
   ;; Define the file
   ;;
   !5 = metadata !{
-    metadata !"foo.cpp",               ;; File
-    metadata !"/Volumes/Data/tmp",     ;; Directory
-  }
-  !6 = metadata !{
-    i32 786473,                        ;; Tag
-    metadata !5                        ;; Unused
+    ; Header(
+    ;   i32 41             ;; Tag
+    ; )
+    metadata !"0x29",
+    metadata !1            ;; File/directory pair
   } ; [ DW_TAG_file_type ]
 
   ;;
   ;; Define the type
   ;;
-  !7 = metadata !{
-    i32 786468,                         ;; Tag
-    null,                               ;; Unused
-    null,                               ;; Unused
-    metadata !"int",                    ;; Name
-    i32 0,                              ;; Line
-    i64 32,                             ;; Size in Bits
-    i64 32,                             ;; Align in Bits
-    i64 0,                              ;; Offset
-    i32 0,                              ;; Flags
-    i32 5                               ;; Encoding
+  !6 = metadata !{
+    ; Header(
+    ;   i32 36,                       ;; Tag
+    ;   metadata !"int",              ;; Name
+    ;   i32 0,                        ;; Line
+    ;   i64 32,                       ;; Size in Bits
+    ;   i64 32,                       ;; Align in Bits
+    ;   i64 0,                        ;; Offset
+    ;   i32 0,                        ;; Flags
+    ;   i32 5                         ;; Encoding
+    ; )
+    metadata !"0x24\00int\000\0032\0032\000\000\005",
+    null,                             ;; Unused
+    null                              ;; Unused
   } ; [ DW_TAG_base_type ]
 
 C/C++ function information
@@ -1004,26 +1000,31 @@ a C/C++ front-end would generate the following descriptors:
   ;; Define the anchor for subprograms.
   ;;
   !6 = metadata !{
-    i32 786484,        ;; Tag
-    metadata !1,       ;; File
-    metadata !1,       ;; Context
-    metadata !"main",  ;; Name
-    metadata !"main",  ;; Display name
-    metadata !"main",  ;; Linkage name
-    i32 1,             ;; Line number
-    metadata !4,       ;; Type
-    i1 false,          ;; Is local
-    i1 true,           ;; Is definition
-    i32 0,             ;; Virtuality attribute, e.g. pure virtual function
-    i32 0,             ;; Index into virtual table for C++ methods
-    i32 0,             ;; Type that holds virtual table.
-    i32 0,             ;; Flags
-    i1 false,          ;; True if this function is optimized
-    Function *,        ;; Pointer to llvm::Function
-    null,              ;; Function template parameters
-    null,              ;; List of function variables (emitted when optimizing)
-    1                  ;; Line number of the opening '{' of the function
+    ; Header(
+    ;   i32 46,             ;; Tag
+    ;   metadata !"main",   ;; Name
+    ;   metadata !"main",   ;; Display name
+    ;   metadata !"",       ;; Linkage name
+    ;   i32 1,              ;; Line number
+    ;   i1 false,           ;; Is local
+    ;   i1 true,            ;; Is definition
+    ;   i32 0,              ;; Virtuality attribute, e.g. pure virtual function
+    ;   i32 0,              ;; Index into virtual table for C++ methods
+    ;   i32 256,            ;; Flags
+    ;   i1 0,               ;; True if this function is optimized
+    ;   1                   ;; Line number of the opening '{' of the function
+    ; )
+    metadata !"0x2e\00main\00main\00\001\000\001\000\000\00256\000\001",
+    metadata !1,            ;; File
+    metadata !5,            ;; Context
+    metadata !6,            ;; Type
+    null,                   ;; Containing type
+    i32 (i32, i8**)* @main, ;; Pointer to llvm::Function
+    null,                   ;; Function template parameters
+    null,                   ;; Function declaration
+    metadata !2             ;; List of function variables (emitted when optimizing)
   }
+
   ;;
   ;; Define the subprogram itself.
   ;;
@@ -1031,443 +1032,6 @@ a C/C++ front-end would generate the following descriptors:
   ...
   }
 
-C/C++ basic types
------------------
-
-The following are the basic type descriptors for C/C++ core types:
-
-bool
-^^^^
-
-.. code-block:: llvm
-
-  !2 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"bool",  ;; Name
-    i32 0,             ;; Line number
-    i64 8,             ;; Size in Bits
-    i64 8,             ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 2              ;; Encoding
-  }
-
-char
-^^^^
-
-.. code-block:: llvm
-
-  !2 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"char",  ;; Name
-    i32 0,             ;; Line number
-    i64 8,             ;; Size in Bits
-    i64 8,             ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 6              ;; Encoding
-  }
-
-unsigned char
-^^^^^^^^^^^^^
-
-.. code-block:: llvm
-
-  !2 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"unsigned char",
-    i32 0,             ;; Line number
-    i64 8,             ;; Size in Bits
-    i64 8,             ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 8              ;; Encoding
-  }
-
-short
-^^^^^
-
-.. code-block:: llvm
-
-  !2 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"short int",
-    i32 0,             ;; Line number
-    i64 16,            ;; Size in Bits
-    i64 16,            ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 5              ;; Encoding
-  }
-
-unsigned short
-^^^^^^^^^^^^^^
-
-.. code-block:: llvm
-
-  !2 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"short unsigned int",
-    i32 0,             ;; Line number
-    i64 16,            ;; Size in Bits
-    i64 16,            ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 7              ;; Encoding
-  }
-
-int
-^^^
-
-.. code-block:: llvm
-
-  !2 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"int",   ;; Name
-    i32 0,             ;; Line number
-    i64 32,            ;; Size in Bits
-    i64 32,            ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 5              ;; Encoding
-  }
-
-unsigned int
-^^^^^^^^^^^^
-
-.. code-block:: llvm
-
-  !2 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"unsigned int",
-    i32 0,             ;; Line number
-    i64 32,            ;; Size in Bits
-    i64 32,            ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 7              ;; Encoding
-  }
-
-long long
-^^^^^^^^^
-
-.. code-block:: llvm
-
-  !2 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"long long int",
-    i32 0,             ;; Line number
-    i64 64,            ;; Size in Bits
-    i64 64,            ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 5              ;; Encoding
-  }
-
-unsigned long long
-^^^^^^^^^^^^^^^^^^
-
-.. code-block:: llvm
-
-  !2 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"long long unsigned int",
-    i32 0,             ;; Line number
-    i64 64,            ;; Size in Bits
-    i64 64,            ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 7              ;; Encoding
-  }
-
-float
-^^^^^
-
-.. code-block:: llvm
-
-  !2 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"float",
-    i32 0,             ;; Line number
-    i64 32,            ;; Size in Bits
-    i64 32,            ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 4              ;; Encoding
-  }
-
-double
-^^^^^^
-
-.. code-block:: llvm
-
-  !2 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"double",;; Name
-    i32 0,             ;; Line number
-    i64 64,            ;; Size in Bits
-    i64 64,            ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 4              ;; Encoding
-  }
-
-C/C++ derived types
--------------------
-
-Given the following as an example of C/C++ derived type:
-
-.. code-block:: c
-
-  typedef const int *IntPtr;
-
-a C/C++ front-end would generate the following descriptors:
-
-.. code-block:: llvm
-
-  ;;
-  ;; Define the typedef "IntPtr".
-  ;;
-  !2 = metadata !{
-    i32 786454,          ;; Tag
-    metadata !3,         ;; File
-    metadata !1,         ;; Context
-    metadata !"IntPtr",  ;; Name
-    i32 0,               ;; Line number
-    i64 0,               ;; Size in bits
-    i64 0,               ;; Align in bits
-    i64 0,               ;; Offset in bits
-    i32 0,               ;; Flags
-    metadata !4          ;; Derived From type
-  }
-  ;;
-  ;; Define the pointer type.
-  ;;
-  !4 = metadata !{
-    i32 786447,          ;; Tag
-    null,                ;; File
-    null,                ;; Context
-    metadata !"",        ;; Name
-    i32 0,               ;; Line number
-    i64 64,              ;; Size in bits
-    i64 64,              ;; Align in bits
-    i64 0,               ;; Offset in bits
-    i32 0,               ;; Flags
-    metadata !5          ;; Derived From type
-  }
-  ;;
-  ;; Define the const type.
-  ;;
-  !5 = metadata !{
-    i32 786470,          ;; Tag
-    null,                ;; File
-    null,                ;; Context
-    metadata !"",        ;; Name
-    i32 0,               ;; Line number
-    i64 0,               ;; Size in bits
-    i64 0,               ;; Align in bits
-    i64 0,               ;; Offset in bits
-    i32 0,               ;; Flags
-    metadata !6          ;; Derived From type
-  }
-  ;;
-  ;; Define the int type.
-  ;;
-  !6 = metadata !{
-    i32 786468,          ;; Tag
-    null,                ;; File
-    null,                ;; Context
-    metadata !"int",     ;; Name
-    i32 0,               ;; Line number
-    i64 32,              ;; Size in bits
-    i64 32,              ;; Align in bits
-    i64 0,               ;; Offset in bits
-    i32 0,               ;; Flags
-    i32 5                ;; Encoding
-  }
-
-C/C++ struct/union types
-------------------------
-
-Given the following as an example of C/C++ struct type:
-
-.. code-block:: c
-
-  struct Color {
-    unsigned Red;
-    unsigned Green;
-    unsigned Blue;
-  };
-
-a C/C++ front-end would generate the following descriptors:
-
-.. code-block:: llvm
-
-  ;;
-  ;; Define basic type for unsigned int.
-  ;;
-  !5 = metadata !{
-    i32 786468,        ;; Tag
-    null,              ;; File
-    null,              ;; Context
-    metadata !"unsigned int",
-    i32 0,             ;; Line number
-    i64 32,            ;; Size in Bits
-    i64 32,            ;; Align in Bits
-    i64 0,             ;; Offset in Bits
-    i32 0,             ;; Flags
-    i32 7              ;; Encoding
-  }
-  ;;
-  ;; Define composite type for struct Color.
-  ;;
-  !2 = metadata !{
-    i32 786451,        ;; Tag
-    metadata !1,       ;; Compile unit
-    null,              ;; Context
-    metadata !"Color", ;; Name
-    i32 1,             ;; Line number
-    i64 96,            ;; Size in bits
-    i64 32,            ;; Align in bits
-    i64 0,             ;; Offset in bits
-    i32 0,             ;; Flags
-    null,              ;; Derived From
-    metadata !3,       ;; Elements
-    i32 0,             ;; Runtime Language
-    null,              ;; Base type containing the vtable pointer for this type
-    null               ;; Template parameters
-  }
-
-  ;;
-  ;; Define the Red field.
-  ;;
-  !4 = metadata !{
-    i32 786445,        ;; Tag
-    metadata !1,       ;; File
-    metadata !1,       ;; Context
-    metadata !"Red",   ;; Name
-    i32 2,             ;; Line number
-    i64 32,            ;; Size in bits
-    i64 32,            ;; Align in bits
-    i64 0,             ;; Offset in bits
-    i32 0,             ;; Flags
-    metadata !5        ;; Derived From type
-  }
-
-  ;;
-  ;; Define the Green field.
-  ;;
-  !6 = metadata !{
-    i32 786445,        ;; Tag
-    metadata !1,       ;; File
-    metadata !1,       ;; Context
-    metadata !"Green", ;; Name
-    i32 3,             ;; Line number
-    i64 32,            ;; Size in bits
-    i64 32,            ;; Align in bits
-    i64 32,             ;; Offset in bits
-    i32 0,             ;; Flags
-    metadata !5        ;; Derived From type
-  }
-
-  ;;
-  ;; Define the Blue field.
-  ;;
-  !7 = metadata !{
-    i32 786445,        ;; Tag
-    metadata !1,       ;; File
-    metadata !1,       ;; Context
-    metadata !"Blue",  ;; Name
-    i32 4,             ;; Line number
-    i64 32,            ;; Size in bits
-    i64 32,            ;; Align in bits
-    i64 64,             ;; Offset in bits
-    i32 0,             ;; Flags
-    metadata !5        ;; Derived From type
-  }
-
-  ;;
-  ;; Define the array of fields used by the composite type Color.
-  ;;
-  !3 = metadata !{metadata !4, metadata !6, metadata !7}
-
-C/C++ enumeration types
------------------------
-
-Given the following as an example of C/C++ enumeration type:
-
-.. code-block:: c
-
-  enum Trees {
-    Spruce = 100,
-    Oak = 200,
-    Maple = 300
-  };
-
-a C/C++ front-end would generate the following descriptors:
-
-.. code-block:: llvm
-
-  ;;
-  ;; Define composite type for enum Trees
-  ;;
-  !2 = metadata !{
-    i32 786436,        ;; Tag
-    metadata !1,       ;; File
-    metadata !1,       ;; Context
-    metadata !"Trees", ;; Name
-    i32 1,             ;; Line number
-    i64 32,            ;; Size in bits
-    i64 32,            ;; Align in bits
-    i64 0,             ;; Offset in bits
-    i32 0,             ;; Flags
-    null,              ;; Derived From type
-    metadata !3,       ;; Elements
-    i32 0              ;; Runtime language
-  }
-
-  ;;
-  ;; Define the array of enumerators used by composite type Trees.
-  ;;
-  !3 = metadata !{metadata !4, metadata !5, metadata !6}
-
-  ;;
-  ;; Define Spruce enumerator.
-  ;;
-  !4 = metadata !{i32 786472, metadata !"Spruce", i64 100}
-
-  ;;
-  ;; Define Oak enumerator.
-  ;;
-  !5 = metadata !{i32 786472, metadata !"Oak", i64 200}
-
-  ;;
-  ;; Define Maple enumerator.
-  ;;
-  !6 = metadata !{i32 786472, metadata !"Maple", i64 300}
-
 Debugging information format
 ============================
 
@@ -1650,21 +1214,33 @@ New DWARF Attributes
 New DWARF Constants
 ^^^^^^^^^^^^^^^^^^^
 
-+--------------------------------+-------+
-| Name                           | Value |
-+================================+=======+
-| DW_AT_APPLE_PROPERTY_readonly  | 0x1   |
-+--------------------------------+-------+
-| DW_AT_APPLE_PROPERTY_readwrite | 0x2   |
-+--------------------------------+-------+
-| DW_AT_APPLE_PROPERTY_assign    | 0x4   |
-+--------------------------------+-------+
-| DW_AT_APPLE_PROPERTY_retain    | 0x8   |
-+--------------------------------+-------+
-| DW_AT_APPLE_PROPERTY_copy      | 0x10  |
-+--------------------------------+-------+
-| DW_AT_APPLE_PROPERTY_nonatomic | 0x20  |
-+--------------------------------+-------+
++--------------------------------------+-------+
+| Name                                 | Value |
++======================================+=======+
+| DW_APPLE_PROPERTY_readonly           | 0x01  |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_getter             | 0x02  |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_assign             | 0x04  |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_readwrite          | 0x08  |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_retain             | 0x10  |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_copy               | 0x20  |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_nonatomic          | 0x40  |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_setter             | 0x80  |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_atomic             | 0x100 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_weak               | 0x200 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_strong             | 0x400 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_unsafe_unretained  | 0x800 |
++--------------------------------+-----+-------+
 
 Name Accelerator Tables
 -----------------------
diff --git a/docs/StackMaps.rst b/docs/StackMaps.rst
index bd0fb946f9a5..5bb05540dec3 100644
--- a/docs/StackMaps.rst
+++ b/docs/StackMaps.rst
@@ -221,6 +221,12 @@ lowered according to the calling convention specified at the
 intrinsic's callsite. Variants of the intrinsic with non-void return
 type also return a value according to calling convention.
 
+On PowerPC, note that the ``<target>`` must be the actual intended target of
+the indirect call, not the function-descriptor address normally used as the
+C/C++ function-pointer representation. As a result, the call target must be
+local because no adjustment or restoration of the TOC pointer (in register r2)
+will be performed.
+
 Requesting zero patch point arguments is valid. In this case, all
 variable operands are handled just like
 ``llvm.experimental.stackmap.*``. The difference is that space will
diff --git a/docs/Statepoints.rst b/docs/Statepoints.rst
new file mode 100644
index 000000000000..53643b1c6d31
--- /dev/null
+++ b/docs/Statepoints.rst
@@ -0,0 +1,411 @@
+=====================================
+Garbage Collection Safepoints in LLVM
+=====================================
+
+.. contents::
+   :local:
+   :depth: 2
+
+Status
+=======
+
+This document describes a set of experimental extensions to LLVM. Use
+with caution.  Because the intrinsics have experimental status,
+compatibility across LLVM releases is not guaranteed.
+
+LLVM currently supports an alternate mechanism for conservative
+garbage collection support using the gc_root intrinsic.  The mechanism
+described here shares little in common with the alternate
+implementation and it is hoped that this mechanism will eventually
+replace the gc_root mechanism.
+
+Overview
+========
+
+To collect dead objects, garbage collectors must be able to identify
+any references to objects contained within executing code, and,
+depending on the collector, potentially update them.  The collector
+does not need this information at all points in code - that would make
+the problem much harder - but only at well-defined points in the
+execution known as 'safepoints' For most collectors, it is sufficient
+to track at least one copy of each unique pointer value.  However, for
+a collector which wishes to relocate objects directly reachable from
+running code, a higher standard is required.
+
+One additional challenge is that the compiler may compute intermediate
+results ("derived pointers") which point outside of the allocation or
+even into the middle of another allocation.  The eventual use of this
+intermediate value must yield an address within the bounds of the
+allocation, but such "exterior derived pointers" may be visible to the
+collector.  Given this, a garbage collector can not safely rely on the
+runtime value of an address to indicate the object it is associated
+with.  If the garbage collector wishes to move any object, the
+compiler must provide a mapping, for each pointer, to an indication of
+its allocation.
+
+To simplify the interaction between a collector and the compiled code,
+most garbage collectors are organized in terms of three abstractions:
+load barriers, store barriers, and safepoints.
+
+#. A load barrier is a bit of code executed immediately after the
+   machine load instruction, but before any use of the value loaded.
+   Depending on the collector, such a barrier may be needed for all
+   loads, merely loads of a particular type (in the original source
+   language), or none at all.
+
+#. Analogously, a store barrier is a code fragement that runs
+   immediately before the machine store instruction, but after the
+   computation of the value stored.  The most common use of a store
+   barrier is to update a 'card table' in a generational garbage
+   collector.
+
+#. A safepoint is a location at which pointers visible to the compiled
+   code (i.e. currently in registers or on the stack) are allowed to
+   change.  After the safepoint completes, the actual pointer value
+   may differ, but the 'object' (as seen by the source language)
+   pointed to will not.
+
+  Note that the term 'safepoint' is somewhat overloaded.  It refers to
+  both the location at which the machine state is parsable and the
+  coordination protocol involved in bring application threads to a
+  point at which the collector can safely use that information.  The
+  term "statepoint" as used in this document refers exclusively to the
+  former.
+
+This document focuses on the last item - compiler support for
+safepoints in generated code.  We will assume that an outside
+mechanism has decided where to place safepoints.  From our
+perspective, all safepoints will be function calls.  To support
+relocation of objects directly reachable from values in compiled code,
+the collector must be able to:
+
+#. identify every copy of a pointer (including copies introduced by
+   the compiler itself) at the safepoint,
+#. identify which object each pointer relates to, and
+#. potentially update each of those copies.
+
+This document describes the mechanism by which an LLVM based compiler
+can provide this information to a language runtime/collector, and
+ensure that all pointers can be read and updated if desired.  The
+heart of the approach is to construct (or rewrite) the IR in a manner
+where the possible updates performed by the garbage collector are
+explicitly visible in the IR.  Doing so requires that we:
+
+#. create a new SSA value for each potentially relocated pointer, and
+   ensure that no uses of the original (non relocated) value is
+   reachable after the safepoint,
+#. specify the relocation in a way which is opaque to the compiler to
+   ensure that the optimizer can not introduce new uses of an
+   unrelocated value after a statepoint. This prevents the optimizer
+   from performing unsound optimizations.
+#. recording a mapping of live pointers (and the allocation they're
+   associated with) for each statepoint.
+
+At the most abstract level, inserting a safepoint can be thought of as
+replacing a call instruction with a call to a multiple return value
+function which both calls the original target of the call, returns
+it's result, and returns updated values for any live pointers to
+garbage collected objects.
+
+  Note that the task of identifying all live pointers to garbage
+  collected values, transforming the IR to expose a pointer giving the
+  base object for every such live pointer, and inserting all the
+  intrinsics correctly is explicitly out of scope for this document.
+  The recommended approach is described in the section of Late
+  Safepoint Placement below.
+
+This abstract function call is concretely represented by a sequence of
+intrinsic calls known as a 'statepoint sequence'.
+
+
+Let's consider a simple call in LLVM IR:
+  todo
+
+Depending on our language we may need to allow a safepoint during the
+execution of the function called from this site.  If so, we need to
+let the collector update local values in the current frame.
+
+Let's say we need to relocate SSA values 'a', 'b', and 'c' at this
+safepoint.  To represent this, we would generate the statepoint
+sequence:
+
+  todo
+
+Ideally, this sequence would have been represented as a M argument, N
+return value function (where M is the number of values being
+relocated + the original call arguments and N is the original return
+value + each relocated value), but LLVM does not easily support such a
+representation.
+
+Instead, the statepoint intrinsic marks the actual site of the
+safepoint or statepoint.  The statepoint returns a token value (which
+exists only at compile time).  To get back the original return value
+of the call, we use the 'gc.result' intrinsic.  To get the relocation
+of each pointer in turn, we use the 'gc.relocate' intrinsic with the
+appropriate index.  Note that both the gc.relocate and gc.result are
+tied to the statepoint.  The combination forms a "statepoint sequence"
+and represents the entitety of a parseable call or 'statepoint'.
+
+When lowered, this example would generate the following x86 assembly::
+  put assembly here
+
+Each of the potentially relocated values has been spilled to the
+stack, and a record of that location has been recorded to the
+:ref:`Stack Map section <stackmap-section>`.  If the garbage collector
+needs to update any of these pointers during the call, it knows
+exactly what to change.
+
+Intrinsics
+===========
+
+'''gc.statepoint''' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare i32
+        @gc.statepoint(func_type <target>, i64 <#call args>. 
+                       i64 <unused>, ... (call parameters),
+                       i64 <# deopt args>, ... (deopt parameters),
+                       ... (gc parameters))
+
+Overview:
+"""""""""
+
+The statepoint intrinsic represents a call which is parse-able by the
+runtime.
+
+Operands:
+"""""""""
+
+The 'target' operand is the function actually being called.  The
+target can be specified as either a symbolic LLVM function, or as an
+arbitrary Value of appropriate function type.  Note that the function
+type must match the signature of the callee and the types of the 'call
+parameters' arguments.
+
+The '#call args' operand is the number of arguments to the actual
+call.  It must exactly match the number of arguments passed in the
+'call parameters' variable length section.
+
+The 'unused' operand is unused and likely to be removed.  Please do
+not use.
+
+The 'call parameters' arguments are simply the arguments which need to
+be passed to the call target.  They will be lowered according to the
+specified calling convention and otherwise handled like a normal call
+instruction.  The number of arguments must exactly match what is
+specified in '# call args'.  The types must match the signature of
+'target'.
+
+The 'deopt parameters' arguments contain an arbitrary list of Values
+which is meaningful to the runtime.  The runtime may read any of these
+values, but is assumed not to modify them.  If the garbage collector
+might need to modify one of these values, it must also be listed in
+the 'gc pointer' argument list.  The '# deopt args' field indicates
+how many operands are to be interpreted as 'deopt parameters'.
+
+The 'gc parameters' arguments contain every pointer to a garbage
+collector object which potentially needs to be updated by the garbage
+collector.  Note that the argument list must explicitly contain a base
+pointer for every derived pointer listed.  The order of arguments is
+unimportant.  Unlike the other variable length parameter sets, this
+list is not length prefixed.
+
+Semantics:
+""""""""""
+
+A statepoint is assumed to read and write all memory.  As a result,
+memory operations can not be reordered past a statepoint.  It is
+illegal to mark a statepoint as being either 'readonly' or 'readnone'.
+
+Note that legal IR can not perform any memory operation on a 'gc
+pointer' argument of the statepoint in a location statically reachable
+from the statepoint.  Instead, the explicitly relocated value (from a
+''gc.relocate'') must be used.
+
+'''gc.result''' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare type*
+        @gc.result_ptr(i32 %statepoint_token)
+
+      declare fX
+        @gc.result_float(i32 %statepoint_token)
+
+      declare iX
+        @gc.result_int(i32 %statepoint_token)
+
+Overview:
+"""""""""
+
+'''gc.result''' extracts the result of the original call instruction
+which was replaced by the '''gc.statepoint'''.  The '''gc.result'''
+intrinsic is actually a family of three intrinsics due to an
+implementation limitation.  Other than the type of the return value,
+the semantics are the same.
+
+Operands:
+"""""""""
+
+The first and only argument is the '''gc.statepoint''' which starts
+the safepoint sequence of which this '''gc.result'' is a part.
+Despite the typing of this as a generic i32, *only* the value defined
+by a '''gc.statepoint''' is legal here.
+
+Semantics:
+""""""""""
+
+The ''gc.result'' represents the return value of the call target of
+the ''statepoint''.  The type of the ''gc.result'' must exactly match
+the type of the target.  If the call target returns void, there will
+be no ''gc.result''.
+
+A ''gc.result'' is modeled as a 'readnone' pure function.  It has no
+side effects since it is just a projection of the return value of the
+previous call represented by the ''gc.statepoint''.
+
+'''gc.relocate''' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare <type> addrspace(1)*
+        @gc.relocate(i32 %statepoint_token, i32 %base_offset, i32 %pointer_offset)
+
+Overview:
+"""""""""
+
+A ''gc.relocate'' returns the potentially relocated value of a pointer
+at the safepoint.
+
+Operands:
+"""""""""
+
+The first argument is the '''gc.statepoint''' which starts the
+safepoint sequence of which this '''gc.relocation'' is a part.
+Despite the typing of this as a generic i32, *only* the value defined
+by a '''gc.statepoint''' is legal here.
+
+The second argument is an index into the statepoints list of arguments
+which specifies the base pointer for the pointer being relocated.
+This index must land within the 'gc parameter' section of the
+statepoint's argument list.
+
+The third argument is an index into the statepoint's list of arguments
+which specify the (potentially) derived pointer being relocated.  It
+is legal for this index to be the same as the second argument
+if-and-only-if a base pointer is being relocated. This index must land
+within the 'gc parameter' section of the statepoint's argument list.
+
+Semantics:
+""""""""""
+
+The return value of ''gc.relocate'' is the potentially relocated value
+of the pointer specified by it's arguments.  It is unspecified how the
+value of the returned pointer relates to the argument to the
+''gc.statepoint'' other than that a) it points to the same source
+language object with the same offset, and b) the 'based-on'
+relationship of the newly relocated pointers is a projection of the
+unrelocated pointers.  In particular, the integer value of the pointer
+returned is unspecified.
+
+A ''gc.relocate'' is modeled as a 'readnone' pure function.  It has no
+side effects since it is just a way to extract information about work
+done during the actual call modeled by the ''gc.statepoint''.
+
+
+Stack Map Format
+================
+
+Locations for each pointer value which may need read and/or updated by
+the runtime or collector are provided via the :ref:`Stack Map format
+<stackmap-format>` specified in the PatchPoint documentation.
+
+Each statepoint generates the following Locations:
+
+* Constant which describes number of following deopt *Locations* (not
+  operands)
+* Variable number of Locations, one for each deopt parameter listed in
+  the IR statepoint (same number as described by previous Constant)
+* Variable number of Locations pairs, one pair for each unique pointer
+  which needs relocated.  The first Location in each pair describes
+  the base pointer for the object.  The second is the derived pointer
+  actually being relocated.  It is guaranteed that the base pointer
+  must also appear explicitly as a relocation pair if used after the
+  statepoint. There may be fewer pairs then gc parameters in the IR
+  statepoint. Each *unique* pair will occur at least once; duplicates
+  are possible.
+
+Note that the Locations used in each section may describe the same
+physical location.  e.g. A stack slot may appear as a deopt location,
+a gc base pointer, and a gc derived pointer.
+
+The ID field of the 'StkMapRecord' for a statepoint is meaningless and
+it's value is explicitly unspecified.
+
+The LiveOut section of the StkMapRecord will be empty for a statepoint
+record.
+
+Safepoint Semantics & Verification
+==================================
+
+The fundamental correctness property for the compiled code's
+correctness w.r.t. the garbage collector is a dynamic one.  It must be
+the case that there is no dynamic trace such that a operation
+involving a potentially relocated pointer is observably-after a
+safepoint which could relocate it.  'observably-after' is this usage
+means that an outside observer could observe this sequence of events
+in a way which precludes the operation being performed before the
+safepoint.
+
+To understand why this 'observable-after' property is required,
+consider a null comparison performed on the original copy of a
+relocated pointer.  Assuming that control flow follows the safepoint,
+there is no way to observe externally whether the null comparison is
+performed before or after the safepoint.  (Remember, the original
+Value is unmodified by the safepoint.)  The compiler is free to make
+either scheduling choice.
+
+The actual correctness property implemented is slightly stronger than
+this.  We require that there be no *static path* on which a
+potentially relocated pointer is 'observably-after' it may have been
+relocated.  This is slightly stronger than is strictly necessary (and
+thus may disallow some otherwise valid programs), but greatly
+simplifies reasoning about correctness of the compiled code.
+
+By construction, this property will be upheld by the optimizer if
+correctly established in the source IR.  This is a key invariant of
+the design.
+
+The existing IR Verifier pass has been extended to check most of the
+local restrictions on the intrinsics mentioned in their respective
+documentation.  The current implementation in LLVM does not check the
+key relocation invariant, but this is ongoing work on developing such
+a verifier.  Please ask on llvmdev if you're interested in
+experimenting with the current version.
+
+Bugs and Enhancements
+=====================
+
+Currently known bugs and enhancements under consideration can be
+tracked by performing a `bugzilla search
+<http://llvm.org/bugs/buglist.cgi?cmdtype=runnamed&namedcmd=Statepoint%20Bugs&list_id=64342>`_
+for [Statepoint] in the summary field. When filing new bugs, please
+use this tag so that interested parties see the newly filed bug.  As
+with most LLVM features, design discussions take place on `llvmdev
+<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_, and patches
+should be sent to `llvm-commits
+<http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>`_ for review.
+
diff --git a/docs/TableGen/BackEnds.rst b/docs/TableGen/BackEnds.rst
index 42de41da74fb..e8544b65216d 100644
--- a/docs/TableGen/BackEnds.rst
+++ b/docs/TableGen/BackEnds.rst
@@ -78,8 +78,7 @@ returns the (currently, 32-bit unsigned) value of the instruction.
 **Output**: C++ code, implementing the target's CodeEmitter
 class by overriding the virtual functions as ``<Target>CodeEmitter::function()``.
 
-**Usage**: Used to include directly at the end of ``<Target>CodeEmitter.cpp``, and
-with option `-mc-emitter` to be included in ``<Target>MCCodeEmitter.cpp``.
+**Usage**: Used to include directly at the end of ``<Target>MCCodeEmitter.cpp``.
 
 RegisterInfo
 ------------
diff --git a/docs/TableGen/LangIntro.rst b/docs/TableGen/LangIntro.rst
index 54d88731012b..85c74a5b4608 100644
--- a/docs/TableGen/LangIntro.rst
+++ b/docs/TableGen/LangIntro.rst
@@ -94,7 +94,9 @@ supported include:
     uninitialized field
 
 ``0b1001011``
-    binary integer value
+    binary integer value.
+    Note that this is sized by the number of bits given and will not be
+    silently extended/truncated.
 
 ``07654321``
     octal integer value (indicated by a leading 0)
@@ -116,8 +118,9 @@ supported include:
     In rare cases, TableGen is unable to deduce the element type in which case
     the user must specify it explicitly.
 
-``{ a, b, c }``
-    initializer for a "bits<3>" value
+``{ a, b, 0b10 }``
+    initializer for a "bits<4>" value.
+    1-bit from "a", 1-bit from "b", 2-bits from 0b10.
 
 ``value``
     value reference
@@ -208,8 +211,8 @@ supported include:
     on string, int and bit objects.  Use !cast<string> to compare other types of
     objects.
 
-``!shl(a,b)`` ``!srl(a,b)`` ``!sra(a,b)`` ``!add(a,b)``
-    The usual logical and arithmetic operators.
+``!shl(a,b)`` ``!srl(a,b)`` ``!sra(a,b)`` ``!add(a,b)`` ``!and(a,b)``
+    The usual binary and arithmetic operators.
 
 Note that all of the values have rules specifying how they convert to values
 for different types.  These rules allow you to assign a value like "``7``"
diff --git a/docs/TableGen/LangRef.rst b/docs/TableGen/LangRef.rst
index 9b074be38dcd..134afedbb7b4 100644
--- a/docs/TableGen/LangRef.rst
+++ b/docs/TableGen/LangRef.rst
@@ -55,6 +55,10 @@ One aspect to note is that the :token:`DecimalInteger` token *includes* the
 ``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as
 most languages do.
 
+Also note that :token:`BinInteger` creates a value of type ``bits<n>``
+(where ``n`` is the number of bits).  This will implicitly convert to
+integers when needed.
+
 TableGen has identifier-like tokens:
 
 .. productionlist::
@@ -92,7 +96,7 @@ wide variety of meanings:
 .. productionlist::
    BangOperator: one of
                :!eq     !if      !head    !tail      !con
-               :!add    !shl     !sra     !srl
+               :!add    !shl     !sra     !srl       !and
                :!cast   !empty   !subst   !foreach   !listconcat   !strconcat
 
 Syntax
diff --git a/docs/TableGen/index.rst b/docs/TableGen/index.rst
index 0860afa691ec..9526240d54f4 100644
--- a/docs/TableGen/index.rst
+++ b/docs/TableGen/index.rst
@@ -123,7 +123,6 @@ this (at the time of this writing):
     bit hasCtrlDep = 0;
     bit isNotDuplicable = 0;
     bit hasSideEffects = 0;
-    bit neverHasSideEffects = 0;
     InstrItinClass Itinerary = NoItinerary;
     string Constraints = "";
     string DisableEncoding = "";
@@ -273,7 +272,7 @@ that's only useful for debugging of the TableGen files themselves. The power
 in TableGen is, however, to interpret the source files into an internal 
 representation that can be generated into anything you want.
 
-Current usage of TableGen is to create include huge files with tables that you
+Current usage of TableGen is to create huge include files with tables that you
 can either include directly (if the output is in the language you're coding),
 or be used in pre-processing via macros surrounding the include of the file.
 
@@ -292,7 +291,7 @@ Despite being very generic, TableGen has some deficiencies that have been
 pointed out numerous times. The common theme is that, while TableGen allows
 you to build Domain-Specific-Languages, the final languages that you create
 lack the power of other DSLs, which in turn increase considerably the size
-and complecity of TableGen files.
+and complexity of TableGen files.
 
 At the same time, TableGen allows you to create virtually any meaning of
 the basic concepts via custom-made back-ends, which can pervert the original
diff --git a/docs/TestingGuide.rst b/docs/TestingGuide.rst
index 481be55b576b..3463156495a9 100644
--- a/docs/TestingGuide.rst
+++ b/docs/TestingGuide.rst
@@ -22,7 +22,7 @@ Requirements
 ============
 
 In order to use the LLVM testing infrastructure, you will need all of the
-software required to build LLVM, as well as `Python <http://python.org>`_ 2.5 or
+software required to build LLVM, as well as `Python <http://python.org>`_ 2.7 or
 later.
 
 LLVM testing infrastructure organization
@@ -240,6 +240,58 @@ The recommended way to examine output to figure out if the test passes is using
 the :doc:`FileCheck tool <CommandGuide/FileCheck>`. *[The usage of grep in RUN
 lines is deprecated - please do not send or commit patches that use it.]*
 
+Extra files
+-----------
+
+If your test requires extra files besides the file containing the ``RUN:``
+lines, the idiomatic place to put them is in a subdirectory ``Inputs``.
+You can then refer to the extra files as ``%S/Inputs/foo.bar``.
+
+For example, consider ``test/Linker/ident.ll``. The directory structure is
+as follows::
+
+  test/
+    Linker/
+      ident.ll
+      Inputs/
+        ident.a.ll
+        ident.b.ll
+
+For convenience, these are the contents:
+
+.. code-block:: llvm
+
+  ;;;;; ident.ll:
+
+  ; RUN: llvm-link %S/Inputs/ident.a.ll %S/Inputs/ident.b.ll -S | FileCheck %s
+
+  ; Verify that multiple input llvm.ident metadata are linked together.
+
+  ; CHECK-DAG: !llvm.ident = !{!0, !1, !2}
+  ; CHECK-DAG: "Compiler V1"
+  ; CHECK-DAG: "Compiler V2"
+  ; CHECK-DAG: "Compiler V3"
+
+  ;;;;; Inputs/ident.a.ll:
+
+  !llvm.ident = !{!0, !1}
+  !0 = metadata !{metadata !"Compiler V1"}
+  !1 = metadata !{metadata !"Compiler V2"}
+
+  ;;;;; Inputs/ident.b.ll:
+
+  !llvm.ident = !{!0}
+  !0 = metadata !{metadata !"Compiler V3"}
+
+For symmetry reasons, ``ident.ll`` is just a dummy file that doesn't
+actually participate in the test besides holding the ``RUN:`` lines.
+
+.. note::
+
+  Some existing tests use ``RUN: true`` in extra files instead of just
+  putting the extra files in an ``Inputs/`` directory. This pattern is
+  deprecated.
+
 Fragile tests
 -------------
 
diff --git a/docs/WritingAnLLVMBackend.rst b/docs/WritingAnLLVMBackend.rst
index fb7c16fc5458..fdadbb04e94f 100644
--- a/docs/WritingAnLLVMBackend.rst
+++ b/docs/WritingAnLLVMBackend.rst
@@ -161,7 +161,7 @@ To get LLVM to actually build and link your target, you need to add it to the
 know about your target when parsing the ``--enable-targets`` option.  Search
 the configure script for ``TARGETS_TO_BUILD``, add your target to the lists
 there (some creativity required), and then reconfigure.  Alternatively, you can
-change ``autotools/configure.ac`` and regenerate configure by running
+change ``autoconf/configure.ac`` and regenerate configure by running
 ``./autoconf/AutoRegen.sh``.
 
 Target Machine
diff --git a/docs/WritingAnLLVMPass.rst b/docs/WritingAnLLVMPass.rst
index cfbda042cc53..ef2b9538f656 100644
--- a/docs/WritingAnLLVMPass.rst
+++ b/docs/WritingAnLLVMPass.rst
@@ -146,7 +146,7 @@ to avoid using expensive C++ runtime information.
 
 .. code-block:: c++
 
-      virtual bool runOnFunction(Function &F) {
+      bool runOnFunction(Function &F) override {
         errs() << "Hello: ";
         errs().write_escaped(F.getName()) << "\n";
         return false;
@@ -194,7 +194,7 @@ As a whole, the ``.cpp`` file looks like:
         static char ID;
         Hello() : FunctionPass(ID) {}
 
-        virtual bool runOnFunction(Function &F) {
+        bool runOnFunction(Function &F) override {
           errs() << "Hello: ";
           errs().write_escaped(F.getName()) << '\n';
           return false;
@@ -434,9 +434,8 @@ The ``doFinalization(CallGraph &)`` method
   virtual bool doFinalization(CallGraph &CG);
 
 The ``doFinalization`` method is an infrequently used method that is called
-when the pass framework has finished calling :ref:`runOnFunction
-<writing-an-llvm-pass-runOnFunction>` for every function in the program being
-compiled.
+when the pass framework has finished calling :ref:`runOnSCC
+<writing-an-llvm-pass-runOnSCC>` for every SCC in the program being compiled.
 
 .. _writing-an-llvm-pass-FunctionPass:
 
@@ -456,7 +455,7 @@ To be explicit, ``FunctionPass`` subclasses are not allowed to:
 #. Inspect or modify a ``Function`` other than the one currently being processed.
 #. Add or remove ``Function``\ s from the current ``Module``.
 #. Add or remove global variables from the current ``Module``.
-#. Maintain state across invocations of:ref:`runOnFunction
+#. Maintain state across invocations of :ref:`runOnFunction
    <writing-an-llvm-pass-runOnFunction>` (including global data).
 
 Implementing a ``FunctionPass`` is usually straightforward (See the :ref:`Hello
@@ -1163,7 +1162,7 @@ all!  To fix this, we need to add the following :ref:`getAnalysisUsage
 .. code-block:: c++
 
   // We don't modify the program, so we preserve all analyses
-  virtual void getAnalysisUsage(AnalysisUsage &AU) const {
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.setPreservesAll();
   }
 
diff --git a/docs/conf.py b/docs/conf.py
index e0615746dce0..659c3e07fb67 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -47,9 +47,9 @@ copyright = u'2003-2014, LLVM Project'
 # built documents.
 #
 # The short X.Y version.
-version = '3.5'
+version = '3.6'
 # The full version, including alpha/beta/rc tags.
-release = '3.5'
+release = '3.6'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
diff --git a/docs/index.rst b/docs/index.rst
index 1d4fbd9d34d0..dd6ea18f0906 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -199,6 +199,8 @@ For developers of applications which use LLVM as a library.
   (`classes <http://llvm.org/doxygen/inherits.html>`_)
   (`tarball <http://llvm.org/doxygen/doxygen.tar.gz>`_)
 
+`Documentation for Go bindings <http://godoc.org/llvm.org/llvm/bindings/go/llvm>`_
+
 `ViewVC Repository Browser <http://llvm.org/viewvc/>`_
    ..
 
@@ -235,9 +237,13 @@ For API clients and LLVM developers.
    WritingAnLLVMPass
    HowToUseAttributes
    NVPTXUsage
+   R600Usage
    StackMaps
    InAlloca
    BigEndianNEON
+   CoverageMappingFormat
+   Statepoints
+   MergeFunctions
 
 :doc:`WritingAnLLVMPass`
    Information on how to write LLVM transformations and analyses.
@@ -316,6 +322,9 @@ For API clients and LLVM developers.
 :doc:`NVPTXUsage`
    This document describes using the NVPTX back-end to compile GPU kernels.
 
+:doc:`R600Usage`
+   This document describes how to use the R600 back-end.
+
 :doc:`StackMaps`
   LLVM support for mapping instruction addresses to the location of
   values and allowing code to be patched.
@@ -324,6 +333,15 @@ For API clients and LLVM developers.
   LLVM's support for generating NEON instructions on big endian ARM targets is
   somewhat nonintuitive. This document explains the implementation and rationale.
 
+:doc:`CoverageMappingFormat`
+  This describes the format and encoding used for LLVM’s code coverage mapping.
+
+:doc:`Statepoints`
+  This describes a set of experimental extensions for garbage
+  collection support.
+
+:doc:`MergeFunctions`
+  Describes functions merging optimization.
 
 Development Process Documentation
 =================================
@@ -340,6 +358,7 @@ Information about LLVM's development process.
    HowToReleaseLLVM
    Packaging
    ReleaseProcess
+   Phabricator
 
 :doc:`DeveloperPolicy`
    The LLVM project's policy towards developers and their contributions.
@@ -361,11 +380,15 @@ Information about LLVM's development process.
   This is a guide to preparing LLVM releases. Most developers can ignore it.
 
 :doc:`ReleaseProcess`
-  This is a validate a new release, during the release process. Most developers can ignore it.
+  This is a guide to validate a new release, during the release process. Most developers can ignore it.
 
 :doc:`Packaging`
    Advice on packaging LLVM into a distribution.
 
+:doc:`Phabricator`
+   Describes how to use the Phabricator code review tool hosted on
+   http://reviews.llvm.org/ and its command line interface, Arcanist.
+
 Community
 =========
 
diff --git a/docs/tutorial/LangImpl3.rst b/docs/tutorial/LangImpl3.rst
index 7174c09c622b..b7418ccf32a9 100644
--- a/docs/tutorial/LangImpl3.rst
+++ b/docs/tutorial/LangImpl3.rst
@@ -581,7 +581,7 @@ our makefile/command line about which options to use:
 .. code-block:: bash
 
     # Compile
-    clang++ -g -O3 toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy
+    clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core` -o toy
     # Run
     ./toy
 
diff --git a/docs/tutorial/LangImpl4.rst b/docs/tutorial/LangImpl4.rst
index 44e0cc150954..cdaac634dd76 100644
--- a/docs/tutorial/LangImpl4.rst
+++ b/docs/tutorial/LangImpl4.rst
@@ -428,7 +428,7 @@ the LLVM JIT and optimizer. To build this example, use:
 .. code-block:: bash
 
     # Compile
-    clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
     # Run
     ./toy
 
diff --git a/docs/tutorial/LangImpl5.rst b/docs/tutorial/LangImpl5.rst
index ed5b652f6338..72e34b1c9cdd 100644
--- a/docs/tutorial/LangImpl5.rst
+++ b/docs/tutorial/LangImpl5.rst
@@ -736,7 +736,7 @@ the if/then/else and for expressions.. To build this example, use:
 .. code-block:: bash
 
     # Compile
-    clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
     # Run
     ./toy
 
diff --git a/docs/tutorial/LangImpl6.rst b/docs/tutorial/LangImpl6.rst
index 42839fbd7504..bf78bdea74d6 100644
--- a/docs/tutorial/LangImpl6.rst
+++ b/docs/tutorial/LangImpl6.rst
@@ -729,7 +729,7 @@ the if/then/else and for expressions.. To build this example, use:
 .. code-block:: bash
 
     # Compile
-    clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
     # Run
     ./toy
 
diff --git a/docs/tutorial/LangImpl7.rst b/docs/tutorial/LangImpl7.rst
index 849ce50060cc..141f1389630f 100644
--- a/docs/tutorial/LangImpl7.rst
+++ b/docs/tutorial/LangImpl7.rst
@@ -847,7 +847,7 @@ mutable variables and var/in support. To build this example, use:
 .. code-block:: bash
 
     # Compile
-    clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
     # Run
     ./toy
 
diff --git a/docs/tutorial/LangImpl8.rst b/docs/tutorial/LangImpl8.rst
index 6f694931ef86..7b02468180f5 100644
--- a/docs/tutorial/LangImpl8.rst
+++ b/docs/tutorial/LangImpl8.rst
@@ -1,267 +1,459 @@
-======================================================
-Kaleidoscope: Conclusion and other useful LLVM tidbits
-======================================================
+=======================================================
+Kaleidoscope: Extending the Language: Debug Information
+=======================================================
 
 .. contents::
    :local:
 
-Tutorial Conclusion
-===================
-
-Welcome to the final chapter of the "`Implementing a language with
-LLVM <index.html>`_" tutorial. In the course of this tutorial, we have
-grown our little Kaleidoscope language from being a useless toy, to
-being a semi-interesting (but probably still useless) toy. :)
-
-It is interesting to see how far we've come, and how little code it has
-taken. We built the entire lexer, parser, AST, code generator, and an
-interactive run-loop (with a JIT!) by-hand in under 700 lines of
-(non-comment/non-blank) code.
-
-Our little language supports a couple of interesting features: it
-supports user defined binary and unary operators, it uses JIT
-compilation for immediate evaluation, and it supports a few control flow
-constructs with SSA construction.
-
-Part of the idea of this tutorial was to show you how easy and fun it
-can be to define, build, and play with languages. Building a compiler
-need not be a scary or mystical process! Now that you've seen some of
-the basics, I strongly encourage you to take the code and hack on it.
-For example, try adding:
-
--  **global variables** - While global variables have questional value
-   in modern software engineering, they are often useful when putting
-   together quick little hacks like the Kaleidoscope compiler itself.
-   Fortunately, our current setup makes it very easy to add global
-   variables: just have value lookup check to see if an unresolved
-   variable is in the global variable symbol table before rejecting it.
-   To create a new global variable, make an instance of the LLVM
-   ``GlobalVariable`` class.
--  **typed variables** - Kaleidoscope currently only supports variables
-   of type double. This gives the language a very nice elegance, because
-   only supporting one type means that you never have to specify types.
-   Different languages have different ways of handling this. The easiest
-   way is to require the user to specify types for every variable
-   definition, and record the type of the variable in the symbol table
-   along with its Value\*.
--  **arrays, structs, vectors, etc** - Once you add types, you can start
-   extending the type system in all sorts of interesting ways. Simple
-   arrays are very easy and are quite useful for many different
-   applications. Adding them is mostly an exercise in learning how the
-   LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction
-   works: it is so nifty/unconventional, it `has its own
-   FAQ <../GetElementPtr.html>`_! If you add support for recursive types
-   (e.g. linked lists), make sure to read the `section in the LLVM
-   Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that
-   describes how to construct them.
--  **standard runtime** - Our current language allows the user to access
-   arbitrary external functions, and we use it for things like "printd"
-   and "putchard". As you extend the language to add higher-level
-   constructs, often these constructs make the most sense if they are
-   lowered to calls into a language-supplied runtime. For example, if
-   you add hash tables to the language, it would probably make sense to
-   add the routines to a runtime, instead of inlining them all the way.
--  **memory management** - Currently we can only access the stack in
-   Kaleidoscope. It would also be useful to be able to allocate heap
-   memory, either with calls to the standard libc malloc/free interface
-   or with a garbage collector. If you would like to use garbage
-   collection, note that LLVM fully supports `Accurate Garbage
-   Collection <../GarbageCollection.html>`_ including algorithms that
-   move objects and need to scan/update the stack.
--  **debugger support** - LLVM supports generation of `DWARF Debug
-   info <../SourceLevelDebugging.html>`_ which is understood by common
-   debuggers like GDB. Adding support for debug info is fairly
-   straightforward. The best way to understand it is to compile some
-   C/C++ code with "``clang -g -O0``" and taking a look at what it
-   produces.
--  **exception handling support** - LLVM supports generation of `zero
-   cost exceptions <../ExceptionHandling.html>`_ which interoperate with
-   code compiled in other languages. You could also generate code by
-   implicitly making every function return an error value and checking
-   it. You could also make explicit use of setjmp/longjmp. There are
-   many different ways to go here.
--  **object orientation, generics, database access, complex numbers,
-   geometric programming, ...** - Really, there is no end of crazy
-   features that you can add to the language.
--  **unusual domains** - We've been talking about applying LLVM to a
-   domain that many people are interested in: building a compiler for a
-   specific language. However, there are many other domains that can use
-   compiler technology that are not typically considered. For example,
-   LLVM has been used to implement OpenGL graphics acceleration,
-   translate C++ code to ActionScript, and many other cute and clever
-   things. Maybe you will be the first to JIT compile a regular
-   expression interpreter into native code with LLVM?
-
-Have fun - try doing something crazy and unusual. Building a language
-like everyone else always has, is much less fun than trying something a
-little crazy or off the wall and seeing how it turns out. If you get
-stuck or want to talk about it, feel free to email the `llvmdev mailing
-list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_: it has lots
-of people who are interested in languages and are often willing to help
-out.
-
-Before we end this tutorial, I want to talk about some "tips and tricks"
-for generating LLVM IR. These are some of the more subtle things that
-may not be obvious, but are very useful if you want to take advantage of
-LLVM's capabilities.
-
-Properties of the LLVM IR
-=========================
-
-We have a couple common questions about code in the LLVM IR form - lets
-just get these out of the way right now, shall we?
-
-Target Independence
--------------------
-
-Kaleidoscope is an example of a "portable language": any program written
-in Kaleidoscope will work the same way on any target that it runs on.
-Many other languages have this property, e.g. lisp, java, haskell,
-javascript, python, etc (note that while these languages are portable,
-not all their libraries are).
-
-One nice aspect of LLVM is that it is often capable of preserving target
-independence in the IR: you can take the LLVM IR for a
-Kaleidoscope-compiled program and run it on any target that LLVM
-supports, even emitting C code and compiling that on targets that LLVM
-doesn't support natively. You can trivially tell that the Kaleidoscope
-compiler generates target-independent code because it never queries for
-any target-specific information when generating code.
-
-The fact that LLVM provides a compact, target-independent,
-representation for code gets a lot of people excited. Unfortunately,
-these people are usually thinking about C or a language from the C
-family when they are asking questions about language portability. I say
-"unfortunately", because there is really no way to make (fully general)
-C code portable, other than shipping the source code around (and of
-course, C source code is not actually portable in general either - ever
-port a really old application from 32- to 64-bits?).
-
-The problem with C (again, in its full generality) is that it is heavily
-laden with target specific assumptions. As one simple example, the
-preprocessor often destructively removes target-independence from the
-code when it processes the input text:
-
-.. code-block:: c
-
-    #ifdef __i386__
-      int X = 1;
-    #else
-      int X = 42;
-    #endif
-
-While it is possible to engineer more and more complex solutions to
-problems like this, it cannot be solved in full generality in a way that
-is better than shipping the actual source code.
-
-That said, there are interesting subsets of C that can be made portable.
-If you are willing to fix primitive types to a fixed size (say int =
-32-bits, and long = 64-bits), don't care about ABI compatibility with
-existing binaries, and are willing to give up some other minor features,
-you can have portable code. This can make sense for specialized domains
-such as an in-kernel language.
-
-Safety Guarantees
------------------
-
-Many of the languages above are also "safe" languages: it is impossible
-for a program written in Java to corrupt its address space and crash the
-process (assuming the JVM has no bugs). Safety is an interesting
-property that requires a combination of language design, runtime
-support, and often operating system support.
-
-It is certainly possible to implement a safe language in LLVM, but LLVM
-IR does not itself guarantee safety. The LLVM IR allows unsafe pointer
-casts, use after free bugs, buffer over-runs, and a variety of other
-problems. Safety needs to be implemented as a layer on top of LLVM and,
-conveniently, several groups have investigated this. Ask on the `llvmdev
-mailing list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_ if
-you are interested in more details.
-
-Language-Specific Optimizations
--------------------------------
-
-One thing about LLVM that turns off many people is that it does not
-solve all the world's problems in one system (sorry 'world hunger',
-someone else will have to solve you some other day). One specific
-complaint is that people perceive LLVM as being incapable of performing
-high-level language-specific optimization: LLVM "loses too much
-information".
-
-Unfortunately, this is really not the place to give you a full and
-unified version of "Chris Lattner's theory of compiler design". Instead,
-I'll make a few observations:
-
-First, you're right that LLVM does lose information. For example, as of
-this writing, there is no way to distinguish in the LLVM IR whether an
-SSA-value came from a C "int" or a C "long" on an ILP32 machine (other
-than debug info). Both get compiled down to an 'i32' value and the
-information about what it came from is lost. The more general issue
-here, is that the LLVM type system uses "structural equivalence" instead
-of "name equivalence". Another place this surprises people is if you
-have two types in a high-level language that have the same structure
-(e.g. two different structs that have a single int field): these types
-will compile down into a single LLVM type and it will be impossible to
-tell what it came from.
-
-Second, while LLVM does lose information, LLVM is not a fixed target: we
-continue to enhance and improve it in many different ways. In addition
-to adding new features (LLVM did not always support exceptions or debug
-info), we also extend the IR to capture important information for
-optimization (e.g. whether an argument is sign or zero extended,
-information about pointers aliasing, etc). Many of the enhancements are
-user-driven: people want LLVM to include some specific feature, so they
-go ahead and extend it.
-
-Third, it is *possible and easy* to add language-specific optimizations,
-and you have a number of choices in how to do it. As one trivial
-example, it is easy to add language-specific optimization passes that
-"know" things about code compiled for a language. In the case of the C
-family, there is an optimization pass that "knows" about the standard C
-library functions. If you call "exit(0)" in main(), it knows that it is
-safe to optimize that into "return 0;" because C specifies what the
-'exit' function does.
-
-In addition to simple library knowledge, it is possible to embed a
-variety of other language-specific information into the LLVM IR. If you
-have a specific need and run into a wall, please bring the topic up on
-the llvmdev list. At the very worst, you can always treat LLVM as if it
-were a "dumb code generator" and implement the high-level optimizations
-you desire in your front-end, on the language-specific AST.
-
-Tips and Tricks
-===============
-
-There is a variety of useful tips and tricks that you come to know after
-working on/with LLVM that aren't obvious at first glance. Instead of
-letting everyone rediscover them, this section talks about some of these
-issues.
-
-Implementing portable offsetof/sizeof
--------------------------------------
-
-One interesting thing that comes up, if you are trying to keep the code
-generated by your compiler "target independent", is that you often need
-to know the size of some LLVM type or the offset of some field in an
-llvm structure. For example, you might need to pass the size of a type
-into a function that allocates memory.
-
-Unfortunately, this can vary widely across targets: for example the
-width of a pointer is trivially target-specific. However, there is a
-`clever way to use the getelementptr
-instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_
-that allows you to compute this in a portable way.
-
-Garbage Collected Stack Frames
-------------------------------
-
-Some languages want to explicitly manage their stack frames, often so
-that they are garbage collected or to allow easy implementation of
-closures. There are often better ways to implement these features than
-explicit stack frames, but `LLVM does support
-them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_
-if you want. It requires your front-end to convert the code into
-`Continuation Passing
-Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and
-the use of tail calls (which LLVM also supports).
+Chapter 8 Introduction
+======================
+
+Welcome to Chapter 8 of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. In chapters 1 through 7, we've built a
+decent little programming language with functions and variables.
+What happens if something goes wrong though, how do you debug your
+program?
+
+Source level debugging uses formatted data that helps a debugger
+translate from binary and the state of the machine back to the
+source that the programmer wrote. In LLVM we generally use a format
+called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding
+that represents types, source locations, and variable locations. 
+
+The short summary of this chapter is that we'll go through the
+various things you have to add to a programming language to
+support debug info, and how you translate that into DWARF.
+
+Caveat: For now we can't debug via the JIT, so we'll need to compile
+our program down to something small and standalone. As part of this
+we'll make a few modifications to the running of the language and
+how programs are compiled. This means that we'll have a source file
+with a simple program written in Kaleidoscope rather than the
+interactive JIT. It does involve a limitation that we can only
+have one "top level" command at a time to reduce the number of
+changes necessary.
+
+Here's the sample program we'll be compiling:
+
+.. code-block:: python
+
+   def fib(x)
+     if x < 3 then
+       1
+     else
+       fib(x-1)+fib(x-2);
+
+   fib(10)
+
+
+Why is this a hard problem?
+===========================
+
+Debug information is a hard problem for a few different reasons - mostly
+centered around optimized code. First, optimization makes keeping source
+locations more difficult. In LLVM IR we keep the original source location
+for each IR level instruction on the instruction. Optimization passes
+should keep the source locations for newly created instructions, but merged
+instructions only get to keep a single location - this can cause jumping
+around when stepping through optimized programs. Secondly, optimization
+can move variables in ways that are either optimized out, shared in memory
+with other variables, or difficult to track. For the purposes of this
+tutorial we're going to avoid optimization (as you'll see with one of the
+next sets of patches).
+
+Ahead-of-Time Compilation Mode
+==============================
+
+To highlight only the aspects of adding debug information to a source
+language without needing to worry about the complexities of JIT debugging
+we're going to make a few changes to Kaleidoscope to support compiling
+the IR emitted by the front end into a simple standalone program that
+you can execute, debug, and see results.
+
+First we make our anonymous function that contains our top level
+statement be our "main":
+
+.. code-block:: udiff
+
+  -    PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
+  +    PrototypeAST *Proto = new PrototypeAST("main", std::vector<std::string>());
+
+just with the simple change of giving it a name.
+
+Then we're going to remove the command line code wherever it exists:
+
+.. code-block:: udiff
+
+  @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() {
+   /// top ::= definition | external | expression | ';'
+   static void MainLoop() {
+     while (1) {
+  -    fprintf(stderr, "ready> ");
+       switch (CurTok) {
+       case tok_eof:
+         return;
+  @@ -1184,7 +1183,6 @@ int main() {
+     BinopPrecedence['*'] = 40; // highest.
+ 
+     // Prime the first token.
+  -  fprintf(stderr, "ready> ");
+     getNextToken();
+ 
+Lastly we're going to disable all of the optimization passes and the JIT so
+that the only thing that happens after we're done parsing and generating
+code is that the llvm IR goes to standard error:
+
+.. code-block:: udiff
+
+  @@ -1108,17 +1108,8 @@ static void HandleExtern() {
+   static void HandleTopLevelExpression() {
+     // Evaluate a top-level expression into an anonymous function.
+     if (FunctionAST *F = ParseTopLevelExpr()) {
+  -    if (Function *LF = F->Codegen()) {
+  -      // We're just doing this to make sure it executes.
+  -      TheExecutionEngine->finalizeObject();
+  -      // JIT the function, returning a function pointer.
+  -      void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
+  -
+  -      // Cast it to the right type (takes no arguments, returns a double) so we
+  -      // can call it as a native function.
+  -      double (*FP)() = (double (*)())(intptr_t)FPtr;
+  -      // Ignore the return value for this.
+  -      (void)FP;
+  +    if (!F->Codegen()) {
+  +      fprintf(stderr, "Error generating code for top level expr");
+       }
+     } else {
+       // Skip token for error recovery.
+  @@ -1439,11 +1459,11 @@ int main() {
+     // target lays out data structures.
+     TheModule->setDataLayout(TheExecutionEngine->getDataLayout());
+     OurFPM.add(new DataLayoutPass());
+  +#if 0
+     OurFPM.add(createBasicAliasAnalysisPass());
+     // Promote allocas to registers.
+     OurFPM.add(createPromoteMemoryToRegisterPass());
+  @@ -1218,7 +1210,7 @@ int main() {
+     OurFPM.add(createGVNPass());
+     // Simplify the control flow graph (deleting unreachable blocks, etc).
+     OurFPM.add(createCFGSimplificationPass());
+  -
+  +  #endif
+     OurFPM.doInitialization();
+ 
+     // Set the global so the code gen can use this.
+
+This relatively small set of changes get us to the point that we can compile
+our piece of Kaleidoscope language down to an executable program via this
+command line:
+
+.. code-block:: bash
+
+  Kaleidoscope-Ch8 < fib.ks | & clang -x ir -
+
+which gives an a.out/a.exe in the current working directory.
+
+Compile Unit
+============
+
+The top level container for a section of code in DWARF is a compile unit.
+This contains the type and function data for an individual translation unit
+(read: one file of source code). So the first thing we need to do is
+construct one for our fib.ks file.
+
+DWARF Emission Setup
+====================
+
+Similar to the ``IRBuilder`` class we have a
+```DIBuilder`` <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class
+that helps in constructing debug metadata for an llvm IR file. It
+corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names.
+Using it does require that you be more familiar with DWARF terminology than
+you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you
+read through the general documentation on the
+```Metadata Format`` <http://llvm.org/docs/SourceLevelDebugging.html>`_ it
+should be a little more clear. We'll be using this class to construct all
+of our IR level descriptions. Construction for it takes a module so we
+need to construct it shortly after we construct our module. We've left it
+as a global static variable to make it a bit easier to use.
+
+Next we're going to create a small container to cache some of our frequent
+data. The first will be our compile unit, but we'll also write a bit of
+code for our one type since we won't have to worry about multiple typed
+expressions:
+
+.. code-block:: c++
+
+  static DIBuilder *DBuilder;
+
+  struct DebugInfo {
+    DICompileUnit TheCU;
+    DIType DblTy;
+
+    DIType getDoubleTy();
+  } KSDbgInfo;
+
+  DIType DebugInfo::getDoubleTy() {
+    if (DblTy.isValid())
+      return DblTy;
+
+    DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);
+    return DblTy;
+  }
+
+And then later on in ``main`` when we're constructing our module:
+
+.. code-block:: c++
+
+  DBuilder = new DIBuilder(*TheModule);
+
+  KSDbgInfo.TheCU = DBuilder->createCompileUnit(
+      dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);
+
+There are a couple of things to note here. First, while we're producing a
+compile unit for a language called Kaleidoscope we used the language
+constant for C. This is because a debugger wouldn't necessarily understand
+the calling conventions or default ABI for a language it doesn't recognize
+and we follow the C ABI in our llvm code generation so it's the closest
+thing to accurate. This ensures we can actually call functions from the
+debugger and have them execute. Secondly, you'll see the "fib.ks" in the
+call to ``createCompileUnit``. This is a default hard coded value since
+we're using shell redirection to put our source into the Kaleidoscope
+compiler. In a usual front end you'd have an input file name and it would
+go there.
+
+One last thing as part of emitting debug information via DIBuilder is that
+we need to "finalize" the debug information. The reasons are part of the
+underlying API for DIBuilder, but make sure you do this near the end of
+main:
+
+.. code-block:: c++
+
+  DBuilder->finalize();
+
+before you dump out the module.
+
+Functions
+=========
+
+Now that we have our ``Compile Unit`` and our source locations, we can add
+function definitions to the debug info. So in ``PrototypeAST::Codegen`` we
+add a few lines of code to describe a context for our subprogram, in this
+case the "File", and the actual definition of the function itself.
+
+So the context:
+
+.. code-block:: c++
+
+  DIFile Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
+                                     KSDbgInfo.TheCU.getDirectory());
+
+giving us a DIFile and asking the ``Compile Unit`` we created above for the
+directory and filename where we are currently. Then, for now, we use some
+source locations of 0 (since our AST doesn't currently have source location
+information) and construct our function definition:
+
+.. code-block:: c++
+
+  DIDescriptor FContext(Unit);
+  unsigned LineNo = 0;
+  unsigned ScopeLine = 0;
+  DISubprogram SP = DBuilder->createFunction(
+      FContext, Name, StringRef(), Unit, LineNo,
+      CreateFunctionType(Args.size(), Unit), false /* internal linkage */,
+      true /* definition */, ScopeLine, DIDescriptor::FlagPrototyped, false, F);
+
+and we now have a DISubprogram that contains a reference to all of our metadata
+for the function.
+
+Source Locations
+================
+
+The most important thing for debug information is accurate source location -
+this makes it possible to map your source code back. We have a problem though,
+Kaleidoscope really doesn't have any source location information in the lexer
+or parser so we'll need to add it.
+
+.. code-block:: c++
+
+   struct SourceLocation {
+     int Line;
+     int Col;
+   };
+   static SourceLocation CurLoc;
+   static SourceLocation LexLoc = {1, 0};
+
+   static int advance() {
+     int LastChar = getchar();
+
+     if (LastChar == '\n' || LastChar == '\r') {
+       LexLoc.Line++;
+       LexLoc.Col = 0;
+     } else
+       LexLoc.Col++;
+     return LastChar;
+   }
+
+In this set of code we've added some functionality on how to keep track of the
+line and column of the "source file". As we lex every token we set our current
+current "lexical location" to the assorted line and column for the beginning
+of the token. We do this by overriding all of the previous calls to
+``getchar()`` with our new ``advance()`` that keeps track of the information
+and then we have added to all of our AST classes a source location:
+
+.. code-block:: c++
+
+   class ExprAST {
+     SourceLocation Loc;
+
+     public:
+       int getLine() const { return Loc.Line; }
+       int getCol() const { return Loc.Col; }
+       ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
+       virtual std::ostream &dump(std::ostream &out, int ind) {
+         return out << ':' << getLine() << ':' << getCol() << '\n';
+       }
+
+that we pass down through when we create a new expression:
+
+.. code-block:: c++
+
+   LHS = new BinaryExprAST(BinLoc, BinOp, LHS, RHS);
+
+giving us locations for each of our expressions and variables.
+
+From this we can make sure to tell ``DIBuilder`` when we're at a new source
+location so it can use that when we generate the rest of our code and make
+sure that each instruction has source location information. We do this
+by constructing another small function:
+
+.. code-block:: c++
+
+  void DebugInfo::emitLocation(ExprAST *AST) {
+    DIScope *Scope;
+    if (LexicalBlocks.empty())
+      Scope = &TheCU;
+    else
+      Scope = LexicalBlocks.back();
+    Builder.SetCurrentDebugLocation(
+        DebugLoc::get(AST->getLine(), AST->getCol(), DIScope(*Scope)));
+  }
+
+that both tells the main ``IRBuilder`` where we are, but also what scope
+we're in. Since we've just created a function above we can either be in
+the main file scope (like when we created our function), or now we can be
+in the function scope we just created. To represent this we create a stack
+of scopes:
+
+.. code-block:: c++
+
+   std::vector<DIScope *> LexicalBlocks;
+   std::map<const PrototypeAST *, DIScope> FnScopeMap;
+
+and keep a map of each function to the scope that it represents (a DISubprogram
+is also a DIScope).
+
+Then we make sure to:
+
+.. code-block:: c++
+
+   KSDbgInfo.emitLocation(this);
+
+emit the location every time we start to generate code for a new AST, and
+also:
+
+.. code-block:: c++
+
+  KSDbgInfo.FnScopeMap[this] = SP;
+
+store the scope (function) when we create it and use it:
+
+  KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]);
+
+when we start generating the code for each function.
+
+also, don't forget to pop the scope back off of your scope stack at the
+end of the code generation for the function:
+
+.. code-block:: c++
+
+  // Pop off the lexical block for the function since we added it
+  // unconditionally.
+  KSDbgInfo.LexicalBlocks.pop_back();
+
+Variables
+=========
+
+Now that we have functions, we need to be able to print out the variables
+we have in scope. Let's get our function arguments set up so we can get
+decent backtraces and see how our functions are being called. It isn't
+a lot of code, and we generally handle it when we're creating the
+argument allocas in ``PrototypeAST::CreateArgumentAllocas``.
+
+.. code-block:: c++
+
+  DIScope *Scope = KSDbgInfo.LexicalBlocks.back();
+  DIFile Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
+                                     KSDbgInfo.TheCU.getDirectory());
+  DIVariable D = DBuilder->createLocalVariable(dwarf::DW_TAG_arg_variable,
+                                               *Scope, Args[Idx], Unit, Line,
+                                               KSDbgInfo.getDoubleTy(), Idx);
+
+  Instruction *Call = DBuilder->insertDeclare(
+      Alloca, D, DBuilder->createExpression(), Builder.GetInsertBlock());
+  Call->setDebugLoc(DebugLoc::get(Line, 0, *Scope));
+
+Here we're doing a few things. First, we're grabbing our current scope
+for the variable so we can say what range of code our variable is valid
+through. Second, we're creating the variable, giving it the scope,
+the name, source location, type, and since it's an argument, the argument
+index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR
+level that we've got a variable in an alloca (and it gives a starting
+location for the variable). Lastly, we set a source location for the
+beginning of the scope on the declare.
+
+One interesting thing to note at this point is that various debuggers have
+assumptions based on how code and debug information was generated for them
+in the past. In this case we need to do a little bit of a hack to avoid
+generating line information for the function prologue so that the debugger
+knows to skip over those instructions when setting a breakpoint. So in
+``FunctionAST::CodeGen`` we add a couple of lines:
+
+.. code-block:: c++
+
+  // Unset the location for the prologue emission (leading instructions with no
+  // location in a function are considered part of the prologue and the debugger
+  // will run past them when breaking on a function)
+  KSDbgInfo.emitLocation(nullptr);
+
+and then emit a new location when we actually start generating code for the
+body of the function:
+
+.. code-block:: c++
+
+  KSDbgInfo.emitLocation(Body);
+
+With this we have enough debug information to set breakpoints in functions,
+print out argument variables, and call functions. Not too bad for just a
+few simple lines of code!
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example, enhanced with
+debug information. To build this example, use:
+
+.. code-block:: bash
+
+    # Compile
+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
+    # Run
+    ./toy
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp
+   :language: c++
+
+`Next: Conclusion and other useful LLVM tidbits <LangImpl9.html>`_
 
diff --git a/docs/tutorial/LangImpl9.rst b/docs/tutorial/LangImpl9.rst
new file mode 100644
index 000000000000..6f694931ef86
--- /dev/null
+++ b/docs/tutorial/LangImpl9.rst
@@ -0,0 +1,267 @@
+======================================================
+Kaleidoscope: Conclusion and other useful LLVM tidbits
+======================================================
+
+.. contents::
+   :local:
+
+Tutorial Conclusion
+===================
+
+Welcome to the final chapter of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. In the course of this tutorial, we have
+grown our little Kaleidoscope language from being a useless toy, to
+being a semi-interesting (but probably still useless) toy. :)
+
+It is interesting to see how far we've come, and how little code it has
+taken. We built the entire lexer, parser, AST, code generator, and an
+interactive run-loop (with a JIT!) by-hand in under 700 lines of
+(non-comment/non-blank) code.
+
+Our little language supports a couple of interesting features: it
+supports user defined binary and unary operators, it uses JIT
+compilation for immediate evaluation, and it supports a few control flow
+constructs with SSA construction.
+
+Part of the idea of this tutorial was to show you how easy and fun it
+can be to define, build, and play with languages. Building a compiler
+need not be a scary or mystical process! Now that you've seen some of
+the basics, I strongly encourage you to take the code and hack on it.
+For example, try adding:
+
+-  **global variables** - While global variables have questional value
+   in modern software engineering, they are often useful when putting
+   together quick little hacks like the Kaleidoscope compiler itself.
+   Fortunately, our current setup makes it very easy to add global
+   variables: just have value lookup check to see if an unresolved
+   variable is in the global variable symbol table before rejecting it.
+   To create a new global variable, make an instance of the LLVM
+   ``GlobalVariable`` class.
+-  **typed variables** - Kaleidoscope currently only supports variables
+   of type double. This gives the language a very nice elegance, because
+   only supporting one type means that you never have to specify types.
+   Different languages have different ways of handling this. The easiest
+   way is to require the user to specify types for every variable
+   definition, and record the type of the variable in the symbol table
+   along with its Value\*.
+-  **arrays, structs, vectors, etc** - Once you add types, you can start
+   extending the type system in all sorts of interesting ways. Simple
+   arrays are very easy and are quite useful for many different
+   applications. Adding them is mostly an exercise in learning how the
+   LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction
+   works: it is so nifty/unconventional, it `has its own
+   FAQ <../GetElementPtr.html>`_! If you add support for recursive types
+   (e.g. linked lists), make sure to read the `section in the LLVM
+   Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that
+   describes how to construct them.
+-  **standard runtime** - Our current language allows the user to access
+   arbitrary external functions, and we use it for things like "printd"
+   and "putchard". As you extend the language to add higher-level
+   constructs, often these constructs make the most sense if they are
+   lowered to calls into a language-supplied runtime. For example, if
+   you add hash tables to the language, it would probably make sense to
+   add the routines to a runtime, instead of inlining them all the way.
+-  **memory management** - Currently we can only access the stack in
+   Kaleidoscope. It would also be useful to be able to allocate heap
+   memory, either with calls to the standard libc malloc/free interface
+   or with a garbage collector. If you would like to use garbage
+   collection, note that LLVM fully supports `Accurate Garbage
+   Collection <../GarbageCollection.html>`_ including algorithms that
+   move objects and need to scan/update the stack.
+-  **debugger support** - LLVM supports generation of `DWARF Debug
+   info <../SourceLevelDebugging.html>`_ which is understood by common
+   debuggers like GDB. Adding support for debug info is fairly
+   straightforward. The best way to understand it is to compile some
+   C/C++ code with "``clang -g -O0``" and taking a look at what it
+   produces.
+-  **exception handling support** - LLVM supports generation of `zero
+   cost exceptions <../ExceptionHandling.html>`_ which interoperate with
+   code compiled in other languages. You could also generate code by
+   implicitly making every function return an error value and checking
+   it. You could also make explicit use of setjmp/longjmp. There are
+   many different ways to go here.
+-  **object orientation, generics, database access, complex numbers,
+   geometric programming, ...** - Really, there is no end of crazy
+   features that you can add to the language.
+-  **unusual domains** - We've been talking about applying LLVM to a
+   domain that many people are interested in: building a compiler for a
+   specific language. However, there are many other domains that can use
+   compiler technology that are not typically considered. For example,
+   LLVM has been used to implement OpenGL graphics acceleration,
+   translate C++ code to ActionScript, and many other cute and clever
+   things. Maybe you will be the first to JIT compile a regular
+   expression interpreter into native code with LLVM?
+
+Have fun - try doing something crazy and unusual. Building a language
+like everyone else always has, is much less fun than trying something a
+little crazy or off the wall and seeing how it turns out. If you get
+stuck or want to talk about it, feel free to email the `llvmdev mailing
+list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_: it has lots
+of people who are interested in languages and are often willing to help
+out.
+
+Before we end this tutorial, I want to talk about some "tips and tricks"
+for generating LLVM IR. These are some of the more subtle things that
+may not be obvious, but are very useful if you want to take advantage of
+LLVM's capabilities.
+
+Properties of the LLVM IR
+=========================
+
+We have a couple common questions about code in the LLVM IR form - lets
+just get these out of the way right now, shall we?
+
+Target Independence
+-------------------
+
+Kaleidoscope is an example of a "portable language": any program written
+in Kaleidoscope will work the same way on any target that it runs on.
+Many other languages have this property, e.g. lisp, java, haskell,
+javascript, python, etc (note that while these languages are portable,
+not all their libraries are).
+
+One nice aspect of LLVM is that it is often capable of preserving target
+independence in the IR: you can take the LLVM IR for a
+Kaleidoscope-compiled program and run it on any target that LLVM
+supports, even emitting C code and compiling that on targets that LLVM
+doesn't support natively. You can trivially tell that the Kaleidoscope
+compiler generates target-independent code because it never queries for
+any target-specific information when generating code.
+
+The fact that LLVM provides a compact, target-independent,
+representation for code gets a lot of people excited. Unfortunately,
+these people are usually thinking about C or a language from the C
+family when they are asking questions about language portability. I say
+"unfortunately", because there is really no way to make (fully general)
+C code portable, other than shipping the source code around (and of
+course, C source code is not actually portable in general either - ever
+port a really old application from 32- to 64-bits?).
+
+The problem with C (again, in its full generality) is that it is heavily
+laden with target specific assumptions. As one simple example, the
+preprocessor often destructively removes target-independence from the
+code when it processes the input text:
+
+.. code-block:: c
+
+    #ifdef __i386__
+      int X = 1;
+    #else
+      int X = 42;
+    #endif
+
+While it is possible to engineer more and more complex solutions to
+problems like this, it cannot be solved in full generality in a way that
+is better than shipping the actual source code.
+
+That said, there are interesting subsets of C that can be made portable.
+If you are willing to fix primitive types to a fixed size (say int =
+32-bits, and long = 64-bits), don't care about ABI compatibility with
+existing binaries, and are willing to give up some other minor features,
+you can have portable code. This can make sense for specialized domains
+such as an in-kernel language.
+
+Safety Guarantees
+-----------------
+
+Many of the languages above are also "safe" languages: it is impossible
+for a program written in Java to corrupt its address space and crash the
+process (assuming the JVM has no bugs). Safety is an interesting
+property that requires a combination of language design, runtime
+support, and often operating system support.
+
+It is certainly possible to implement a safe language in LLVM, but LLVM
+IR does not itself guarantee safety. The LLVM IR allows unsafe pointer
+casts, use after free bugs, buffer over-runs, and a variety of other
+problems. Safety needs to be implemented as a layer on top of LLVM and,
+conveniently, several groups have investigated this. Ask on the `llvmdev
+mailing list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_ if
+you are interested in more details.
+
+Language-Specific Optimizations
+-------------------------------
+
+One thing about LLVM that turns off many people is that it does not
+solve all the world's problems in one system (sorry 'world hunger',
+someone else will have to solve you some other day). One specific
+complaint is that people perceive LLVM as being incapable of performing
+high-level language-specific optimization: LLVM "loses too much
+information".
+
+Unfortunately, this is really not the place to give you a full and
+unified version of "Chris Lattner's theory of compiler design". Instead,
+I'll make a few observations:
+
+First, you're right that LLVM does lose information. For example, as of
+this writing, there is no way to distinguish in the LLVM IR whether an
+SSA-value came from a C "int" or a C "long" on an ILP32 machine (other
+than debug info). Both get compiled down to an 'i32' value and the
+information about what it came from is lost. The more general issue
+here, is that the LLVM type system uses "structural equivalence" instead
+of "name equivalence". Another place this surprises people is if you
+have two types in a high-level language that have the same structure
+(e.g. two different structs that have a single int field): these types
+will compile down into a single LLVM type and it will be impossible to
+tell what it came from.
+
+Second, while LLVM does lose information, LLVM is not a fixed target: we
+continue to enhance and improve it in many different ways. In addition
+to adding new features (LLVM did not always support exceptions or debug
+info), we also extend the IR to capture important information for
+optimization (e.g. whether an argument is sign or zero extended,
+information about pointers aliasing, etc). Many of the enhancements are
+user-driven: people want LLVM to include some specific feature, so they
+go ahead and extend it.
+
+Third, it is *possible and easy* to add language-specific optimizations,
+and you have a number of choices in how to do it. As one trivial
+example, it is easy to add language-specific optimization passes that
+"know" things about code compiled for a language. In the case of the C
+family, there is an optimization pass that "knows" about the standard C
+library functions. If you call "exit(0)" in main(), it knows that it is
+safe to optimize that into "return 0;" because C specifies what the
+'exit' function does.
+
+In addition to simple library knowledge, it is possible to embed a
+variety of other language-specific information into the LLVM IR. If you
+have a specific need and run into a wall, please bring the topic up on
+the llvmdev list. At the very worst, you can always treat LLVM as if it
+were a "dumb code generator" and implement the high-level optimizations
+you desire in your front-end, on the language-specific AST.
+
+Tips and Tricks
+===============
+
+There is a variety of useful tips and tricks that you come to know after
+working on/with LLVM that aren't obvious at first glance. Instead of
+letting everyone rediscover them, this section talks about some of these
+issues.
+
+Implementing portable offsetof/sizeof
+-------------------------------------
+
+One interesting thing that comes up, if you are trying to keep the code
+generated by your compiler "target independent", is that you often need
+to know the size of some LLVM type or the offset of some field in an
+llvm structure. For example, you might need to pass the size of a type
+into a function that allocates memory.
+
+Unfortunately, this can vary widely across targets: for example the
+width of a pointer is trivially target-specific. However, there is a
+`clever way to use the getelementptr
+instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_
+that allows you to compute this in a portable way.
+
+Garbage Collected Stack Frames
+------------------------------
+
+Some languages want to explicitly manage their stack frames, often so
+that they are garbage collected or to allow easy implementation of
+closures. There are often better ways to implement these features than
+explicit stack frames, but `LLVM does support
+them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_
+if you want. It requires your front-end to convert the code into
+`Continuation Passing
+Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and
+the use of tail calls (which LLVM also supports).
+