3 files changed, 116 insertions, 80 deletions
diff --git a/docs/LanguageExtensions.rst b/docs/LanguageExtensions.rst
index e155cefb7890..5782edd35370 100644
--- a/docs/LanguageExtensions.rst
+++ b/docs/LanguageExtensions.rst
@@ -474,44 +474,58 @@ Half-Precision Floating Point
 =============================
 
 Clang supports two half-precision (16-bit) floating point types: ``__fp16`` and
-``_Float16``. ``__fp16`` is defined in the ARM C Language Extensions (`ACLE
-<http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053d/IHI0053D_acle_2_1.pdf>`_)
-and ``_Float16`` in ISO/IEC TS 18661-3:2015.
+``_Float16``.  These types are supported in all language modes.
 
-``__fp16`` is a storage and interchange format only. This means that values of
-``__fp16`` promote to (at least) float when used in arithmetic operations.
-There are two ``__fp16`` formats. Clang supports the IEEE 754-2008 format and
-not the ARM alternative format.
+``__fp16`` is supported on every target, as it is purely a storage format; see below.
+``_Float16`` is currently only supported on the following targets, with further
+targets pending ABI standardization:
+- 32-bit ARM
+- 64-bit ARM (AArch64)
+- SPIR
+``_Float16`` will be supported on more targets as they define ABIs for it.
 
-ISO/IEC TS 18661-3:2015 defines C support for additional floating point types.
-``_FloatN`` is defined as a binary floating type, where the N suffix denotes
-the number of bits and is 16, 32, 64, or greater and equal to 128 and a
-multiple of 32. Clang supports ``_Float16``. The difference from ``__fp16`` is
-that arithmetic on ``_Float16`` is performed in half-precision, thus it is not
-a storage-only format. ``_Float16`` is available as a source language type in
-both C and C++ mode.
+``__fp16`` is a storage and interchange format only.  This means that values of
+``__fp16`` are immediately promoted to (at least) ``float`` when used in arithmetic
+operations, so that e.g. the result of adding two ``__fp16`` values has type ``float``.
+The behavior of ``__fp16`` is specified by the ARM C Language Extensions (`ACLE <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053d/IHI0053D_acle_2_1.pdf>`_).
+Clang uses the ``binary16`` format from IEEE 754-2008 for ``__fp16``, not the ARM
+alternative format.
 
-It is recommended that portable code use the ``_Float16`` type because
-``__fp16`` is an ARM C-Language Extension (ACLE), whereas ``_Float16`` is
-defined by the C standards committee, so using ``_Float16`` will not prevent
-code from being ported to architectures other than Arm.  Also, ``_Float16``
-arithmetic and operations will directly map on half-precision instructions when
-they are available (e.g. Armv8.2-A), avoiding conversions to/from
-single-precision, and thus will result in more performant code. If
-half-precision instructions are unavailable, values will be promoted to
-single-precision, similar to the semantics of ``__fp16`` except that the
-results will be stored in single-precision.
+``_Float16`` is an extended floating-point type.  This means that, just like arithmetic on
+``float`` or ``double``, arithmetic on ``_Float16`` operands is formally performed in the
+``_Float16`` type, so that e.g. the result of adding two ``_Float16`` values has type
+``_Float16``.  The behavior of ``_Float16`` is specified by ISO/IEC TS 18661-3:2015
+("Floating-point extensions for C").  As with ``__fp16``, Clang uses the ``binary16``
+format from IEEE 754-2008 for ``_Float16``.
 
-In an arithmetic operation where one operand is of ``__fp16`` type and the
-other is of ``_Float16`` type, the ``_Float16`` type is first converted to
-``__fp16`` type and then the operation is completed as if both operands were of
-``__fp16`` type.
+``_Float16`` arithmetic will be performed using native half-precision support
+when available on the target (e.g. on ARMv8.2a); otherwise it will be performed
+at a higher precision (currently always ``float``) and then truncated down to
+``_Float16``.  Note that C and C++ allow intermediate floating-point operands
+of an expression to be computed with greater precision than is expressible in
+their type, so Clang may avoid intermediate truncations in certain cases; this may
+lead to results that are inconsistent with native arithmetic.
 
-To define a ``_Float16`` literal, suffix ``f16`` can be appended to the compile-time
-constant declaration. There is no default argument promotion for ``_Float16``; this
-applies to the standard floating types only. As a consequence, for example, an
-explicit cast is required for printing a ``_Float16`` value (there is no string
-format specifier for ``_Float16``).
+It is recommended that portable code use ``_Float16`` instead of ``__fp16``,
+as it has been defined by the C standards committee and has behavior that is
+more familiar to most programmers.
+
+Because ``__fp16`` operands are always immediately promoted to ``float``, the
+common real type of ``__fp16`` and ``_Float16`` for the purposes of the usual
+arithmetic conversions is ``float``.
+
+A literal can be given ``_Float16`` type using the suffix ``f16``; for example:
+```
+3.14f16
+```
+
+Because default argument promotion only applies to the standard floating-point
+types, ``_Float16`` values are not promoted to ``double`` when passed as variadic
+or untyped arguments.  As a consequence, some caution must be taken when using
+certain library facilities with ``_Float16``; for example, there is no ``printf`` format
+specifier for ``_Float16``, and (unlike ``float``) it will not be implicitly promoted to
+``double`` when passed to ``printf``, so the programmer must explicitly cast it to
+``double`` before using it with an ``%f`` or similar specifier.
 
 Messages on ``deprecated`` and ``unavailable`` Attributes
 =========================================================
diff --git a/docs/OpenMPSupport.rst b/docs/OpenMPSupport.rst
index 04a9648ca294..7b567c966ee5 100644
--- a/docs/OpenMPSupport.rst
+++ b/docs/OpenMPSupport.rst
@@ -17,60 +17,50 @@
 OpenMP Support
 ==================
 
-Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64,
-PPC64[LE] and has `basic support for Cuda devices`_.
-
-Standalone directives
-=====================
-
-* #pragma omp [for] simd: :good:`Complete`.
-
-* #pragma omp declare simd: :partial:`Partial`.  We support parsing/semantic
-  analysis + generation of special attributes for X86 target, but still
-  missing the LLVM pass for vectorization.
-
-* #pragma omp taskloop [simd]: :good:`Complete`.
-
-* #pragma omp target [enter|exit] data: :good:`Complete`.
-
-* #pragma omp target update: :good:`Complete`.
-
-* #pragma omp target: :good:`Complete`.
+Clang supports the following OpenMP 5.0 features
 
-* #pragma omp declare target: :good:`Complete`.
+* The `reduction`-based clauses in the `task` and `target`-based directives.
 
-* #pragma omp teams: :good:`Complete`.
+* Support relational-op != (not-equal) as one of the canonical forms of random
+  access iterator.
 
-* #pragma omp distribute [simd]: :good:`Complete`.
+* Support for mapping of the lambdas in target regions.
 
-* #pragma omp distribute parallel for [simd]: :good:`Complete`.
+* Parsing/sema analysis for the requires directive.
 
-Combined directives
-===================
+* Nested declare target directives.
 
-* #pragma omp parallel for simd: :good:`Complete`.
+* Make the `this` pointer implicitly mapped as `map(this[:1])`.
 
-* #pragma omp target parallel: :good:`Complete`.
+* The `close` *map-type-modifier*.
 
-* #pragma omp target parallel for [simd]: :good:`Complete`.
-
-* #pragma omp target simd: :good:`Complete`.
-
-* #pragma omp target teams: :good:`Complete`.
-
-* #pragma omp teams distribute [simd]: :good:`Complete`.
-
-* #pragma omp target teams distribute [simd]: :good:`Complete`.
+Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64,
+PPC64[LE] and has `basic support for Cuda devices`_.
 
-* #pragma omp teams distribute parallel for [simd]: :good:`Complete`.
+* #pragma omp declare simd: :partial:`Partial`.  We support parsing/semantic
+  analysis + generation of special attributes for X86 target, but still
+  missing the LLVM pass for vectorization.
 
-* #pragma omp target teams distribute parallel for [simd]: :good:`Complete`.
+In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools
+Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS.
 
-Clang does not support any constructs/updates from OpenMP 5.0 except
-for `reduction`-based clauses in the `task` and `target`-based directives.
+General improvements
+--------------------
+- New collapse clause scheme to avoid expensive remainder operations.
+  Compute loop index variables after collapsing a loop nest via the
+  collapse clause by replacing the expensive remainder operation with
+  multiplications and additions.
 
-In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools
-Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS.
+- The default schedules for the `distribute` and `for` constructs in a
+  parallel region and in SPMD mode have changed to ensure coalesced
+  accesses. For the `distribute` construct, a static schedule is used
+  with a chunk size equal to the number of threads per team (default
+  value of threads or as specified by the `thread_limit` clause if
+  present). For the `for` construct, the schedule is static with chunk
+  size of one.
+  
+- Simplified SPMD code generation for `distribute parallel for` when
+  the new default schedules are applicable.
 
 .. _basic support for Cuda devices:
 
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst
index b6a405dbc78b..50bf636a51f4 100644
--- a/docs/ReleaseNotes.rst
+++ b/docs/ReleaseNotes.rst
@@ -127,6 +127,10 @@ Non-comprehensive list of changes in this release
   manually and rely on the old behaviour you will need to add appropriate
   compiler flags for finding the corresponding libc++ include directory.
 
+- The integrated assembler is used now by default for all MIPS targets.
+
+- Improved support for MIPS N32 ABI and MIPS R6 target triples.
+
 New Compiler Flags
 ------------------
 
@@ -136,6 +140,13 @@ New Compiler Flags
   instrumenting for gcov-based profiling.
   See the :doc:`UsersManual` for details.
 
+- When using a custom stack alignment, the ``stackrealign`` attribute is now
+  implicitly set on the main function.
+
+- Emission of ``R_MIPS_JALR`` and ``R_MICROMIPS_JALR`` relocations can now
+  be controlled by the ``-mrelax-pic-calls`` and ``-mno-relax-pic-calls``
+  options.
+
 - ...
 
 Deprecated Compiler Flags
@@ -179,6 +190,15 @@ Windows Support
   `dllexport` and `dllimport` attributes not apply to inline member functions.
   This can significantly reduce compile and link times. See the `User's Manual
   <UsersManual.html#the-zc-dllexportinlines-option>`_ for more info.
+
+- For MinGW, ``-municode`` now correctly defines ``UNICODE`` during
+  preprocessing.
+
+- For MinGW, clang now produces vtables and RTTI for dllexported classes
+  without key functions. This fixes building Qt in debug mode.
+
+- Allow using Address Sanitizer and Undefined Behaviour Sanitizer on MinGW.
+
 - ...
 
 
@@ -233,12 +253,15 @@ ABI Changes in Clang
 OpenMP Support in Clang
 ----------------------------------
 
-- Support relational-op != (not-equal) as one of the canonical forms of random
-  access iterator.
-
-- Added support for mapping of the lambdas in target regions.
+- OpenMP 5.0 features
 
-- Added parsing/sema analysis for OpenMP 5.0 requires directive.
+  - Support relational-op != (not-equal) as one of the canonical forms of random
+    access iterator.
+  - Added support for mapping of the lambdas in target regions.
+  - Added parsing/sema analysis for the requires directive.
+  - Support nested declare target directives.
+  - Make the `this` pointer implicitly mapped as `map(this[:1])`.
+  - Added the `close` *map-type-modifier*.
 
 - Various bugfixes and improvements.
 
@@ -250,6 +273,15 @@ New features supported for Cuda devices:
 
 - Fixed support for lastprivate/reduction variables in SPMD constructs.
 
+- New collapse clause scheme to avoid expensive remainder operations.
+
+- New default schedule for distribute and parallel constructs.
+
+- Simplified code generation for distribute and parallel in SPMD mode.
+
+- Flag (``-fopenmp_optimistic_collapse``) for user to limit collapsed
+  loop counter width when safe to do so.
+
 - General performance improvement.
 
 CUDA Support in Clang