summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDimitry Andric <dim@FreeBSD.org>2017-02-25 14:40:33 +0000
committerDimitry Andric <dim@FreeBSD.org>2017-02-25 14:40:33 +0000
commit9c618dddcd075579cf4f157ba4a03d088c166dab (patch)
tree25c9f5ba1aa4fdc4e96faae87aec82f1ad0b0a30
parent5a813558fccf203e1f8733f4c8217a03fd87b430 (diff)
downloadsrc-test-9c618dddcd075579cf4f157ba4a03d088c166dab.tar.gz
src-test-9c618dddcd075579cf4f157ba4a03d088c166dab.zip
Vendor import of llvm release_40 branch r296202:vendor/llvm/llvm-release_40-r296202
Notes
Notes: svn path=/vendor/llvm/dist/; revision=314258 svn path=/vendor/llvm/llvm-release_40-r296202/; revision=314259; tag=vendor/llvm/llvm-release_40-r296202
-rw-r--r--docs/ReleaseNotes.rst107
-rw-r--r--include/llvm/Transforms/Vectorize/SLPVectorizer.h6
-rw-r--r--lib/Target/AMDGPU/SIInstructions.td5
-rw-r--r--lib/Target/AMDGPU/VOP1Instructions.td6
-rw-r--r--lib/Transforms/Scalar/CorrelatedValuePropagation.cpp5
-rw-r--r--lib/Transforms/Scalar/Reassociate.cpp8
-rw-r--r--lib/Transforms/Vectorize/SLPVectorizer.cpp265
-rw-r--r--test/CodeGen/AMDGPU/trunc.ll91
-rw-r--r--test/Transforms/CorrelatedValuePropagation/add.ll2
-rw-r--r--test/Transforms/Reassociate/basictest.ll20
-rw-r--r--test/Transforms/SLPVectorizer/X86/horizontal-list.ll158
11 files changed, 295 insertions, 378 deletions
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst
index 6efa4813d7703..4ddba6be2fec7 100644
--- a/docs/ReleaseNotes.rst
+++ b/docs/ReleaseNotes.rst
@@ -5,12 +5,6 @@ LLVM 4.0.0 Release Notes
.. contents::
:local:
-.. warning::
- These are in-progress notes for the upcoming LLVM 4.0.0 release. You may
- prefer the `LLVM 3.9 Release Notes <http://llvm.org/releases/3.9.0/docs
- /ReleaseNotes.html>`_.
-
-
Introduction
============
@@ -28,74 +22,56 @@ them.
Non-comprehensive list of changes in this release
=================================================
-* The C API functions LLVMAddFunctionAttr, LLVMGetFunctionAttr,
- LLVMRemoveFunctionAttr, LLVMAddAttribute, LLVMRemoveAttribute,
- LLVMGetAttribute, LLVMAddInstrAttribute and
- LLVMRemoveInstrAttribute have been removed.
-
-* The C API enum LLVMAttribute has been deleted.
+* Minimum compiler version to build has been raised to GCC 4.8 and VS 2015.
-.. NOTE
- For small 1-3 sentence descriptions, just add an entry at the end of
- this list. If your description won't fit comfortably in one bullet
- point (e.g. maybe you would like to give an example of the
- functionality, or simply have a lot to talk about), see the `NOTE` below
- for adding a new subsection.
+* The C API functions ``LLVMAddFunctionAttr``, ``LLVMGetFunctionAttr``,
+ ``LLVMRemoveFunctionAttr``, ``LLVMAddAttribute``, ``LLVMRemoveAttribute``,
+ ``LLVMGetAttribute``, ``LLVMAddInstrAttribute`` and
+ ``LLVMRemoveInstrAttribute`` have been removed.
-* The definition and uses of LLVM_ATRIBUTE_UNUSED_RESULT in the LLVM source
- were replaced with LLVM_NODISCARD, which matches the C++17 [[nodiscard]]
- semantics rather than gcc's __attribute__((warn_unused_result)).
+* The C API enum ``LLVMAttribute`` has been deleted.
-* Minimum compiler version to build has been raised to GCC 4.8 and VS 2015.
+* The definition and uses of ``LLVM_ATRIBUTE_UNUSED_RESULT`` in the LLVM source
+ were replaced with ``LLVM_NODISCARD``, which matches the C++17 ``[[nodiscard]]``
+ semantics rather than gcc's ``__attribute__((warn_unused_result))``.
* The Timer related APIs now expect a Name and Description. When upgrading code
the previously used names should become descriptions and a short name in the
style of a programming language identifier should be added.
-* LLVM now handles invariant.group across different basic blocks, which makes
+* LLVM now handles ``invariant.group`` across different basic blocks, which makes
it possible to devirtualize virtual calls inside loops.
-* The aggressive dead code elimination phase ("adce") now remove
+* The aggressive dead code elimination phase ("adce") now removes
branches which do not effect program behavior. Loops are retained by
default since they may be infinite but these can also be removed
- with LLVM option -adce-remove-loops when the loop body otherwise has
+ with LLVM option ``-adce-remove-loops`` when the loop body otherwise has
no live operations.
* The GVNHoist pass is now enabled by default. The new pass based on Global
Value Numbering detects similar computations in branch code and replaces
multiple instances of the same computation with a unique expression. The
transform benefits code size and generates better schedules. GVNHoist is
- more aggressive at -Os and -Oz, hoisting more expressions at the expense of
- execution time degradations.
+ more aggressive at ``-Os`` and ``-Oz``, hoisting more expressions at the
+ expense of execution time degradations.
* The llvm-cov tool can now export coverage data as json. Its html output mode
has also improved.
-* ... next change ...
-
-.. NOTE
- If you would like to document a larger change, then you can add a
- subsection about it right here. You can copy the following boilerplate
- and un-indent it (the indentation causes it to be inside this comment).
+Improvements to ThinLTO (-flto=thin)
+------------------------------------
+Integration with profile data (PGO). When available, profile data
+enables more accurate function importing decisions, as well as
+cross-module indirect call promotion.
- Special New Feature
- -------------------
-
- Makes programs 10x faster by doing Special New Thing.
-
- Improvements to ThinLTO (-flto=thin)
- ------------------------------------
- * Integration with profile data (PGO). When available, profile data
- enables more accurate function importing decisions, as well as
- cross-module indirect call promotion.
- * Significant build-time and binary-size improvements when compiling with
- debug info (-g).
+Significant build-time and binary-size improvements when compiling with
+debug info (-g).
LLVM Coroutines
---------------
Experimental support for :doc:`Coroutines` was added, which can be enabled
-with ``-enable-coroutines`` in ``opt`` command tool or using
+with ``-enable-coroutines`` in ``opt`` the command tool or using the
``addCoroutinePassesToExtensionPoints`` API when building the optimization
pipeline.
@@ -106,18 +82,18 @@ For more information on LLVM Coroutines and the LLVM implementation, see
Regcall and Vectorcall Calling Conventions
--------------------------------------------------
-Support was added for _regcall calling convention.
-Existing __vectorcall calling convention support was extended to include
+Support was added for ``_regcall`` calling convention.
+Existing ``__vectorcall`` calling convention support was extended to include
correct handling of HVAs.
-The __vectorcall calling convention was introduced by Microsoft to
+The ``__vectorcall`` calling convention was introduced by Microsoft to
enhance register usage when passing parameters.
For more information please read `__vectorcall documentation
<https://msdn.microsoft.com/en-us/library/dn375768.aspx>`_.
-The __regcall calling convention was introduced by Intel to
+The ``__regcall`` calling convention was introduced by Intel to
optimize parameter transfer on function call.
-This calling convention ensures that as many values as possible are
+This calling convention ensures that as many values as possible are
passed or returned in registers.
For more information please read `__regcall documentation
<https://software.intel.com/en-us/node/693069>`_.
@@ -127,7 +103,7 @@ Code Generation Testing
Passes that work on the machine instruction representation can be tested with
the .mir serialization format. ``llc`` supports the ``-run-pass``,
-``-stop-after``, ``-stop-before``, ``-start-after``, ``-start-before`` to to
+``-stop-after``, ``-stop-before``, ``-start-after``, ``-start-before`` to
run a single pass of the code generation pipeline, or to stop or start the code
generation pipeline at a given point.
@@ -211,9 +187,6 @@ changes landed in this release.
``&*I`` (if not ``end()``); alternatively, clients may refactor to use
references for known-good nodes.
-Changes to the LLVM IR
-----------------------
-
Changes to the ARM Targets
--------------------------
@@ -244,28 +217,6 @@ Changes to the ARM Targets
A lot of work has also been done in LLD for ARM, which now supports more
relocations and TLS.
-
-Changes to the MIPS Target
---------------------------
-
- During this release ...
-
-
-Changes to the PowerPC Target
------------------------------
-
- During this release ...
-
-Changes to the X86 Target
--------------------------
-
- During this release ...
-
-Changes to the AMDGPU Target
------------------------------
-
- During this release ...
-
Changes to the AVR Target
-----------------------------
@@ -297,8 +248,6 @@ Changes to the OCaml bindings
External Open Source Projects Using LLVM 4.0.0
==============================================
-* A project...
-
LDC - the LLVM-based D compiler
-------------------------------
diff --git a/include/llvm/Transforms/Vectorize/SLPVectorizer.h b/include/llvm/Transforms/Vectorize/SLPVectorizer.h
index d669a8e5b615c..4886700774ca8 100644
--- a/include/llvm/Transforms/Vectorize/SLPVectorizer.h
+++ b/include/llvm/Transforms/Vectorize/SLPVectorizer.h
@@ -92,12 +92,6 @@ private:
/// collected in GEPs.
bool vectorizeGEPIndices(BasicBlock *BB, slpvectorizer::BoUpSLP &R);
- /// Try to find horizontal reduction or otherwise vectorize a chain of binary
- /// operators.
- bool vectorizeRootInstruction(PHINode *P, Value *V, BasicBlock *BB,
- slpvectorizer::BoUpSLP &R,
- TargetTransformInfo *TTI);
-
/// \brief Scan the basic block and look for patterns that are likely to start
/// a vectorization chain.
bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);
diff --git a/lib/Target/AMDGPU/SIInstructions.td b/lib/Target/AMDGPU/SIInstructions.td
index b86c041911896..38e31e75ee67d 100644
--- a/lib/Target/AMDGPU/SIInstructions.td
+++ b/lib/Target/AMDGPU/SIInstructions.td
@@ -997,6 +997,11 @@ def : Pat <
>;
def : Pat <
+ (i1 (trunc i16:$a)),
+ (V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))
+>;
+
+def : Pat <
(i1 (trunc i64:$a)),
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1),
(i32 (EXTRACT_SUBREG $a, sub0))), (i32 1))
diff --git a/lib/Target/AMDGPU/VOP1Instructions.td b/lib/Target/AMDGPU/VOP1Instructions.td
index a15b9ceff2f43..8cae83cd9d1a0 100644
--- a/lib/Target/AMDGPU/VOP1Instructions.td
+++ b/lib/Target/AMDGPU/VOP1Instructions.td
@@ -607,12 +607,6 @@ def : Pat<
(COPY $src)
>;
-def : Pat<
- (i1 (trunc i16:$src)),
- (COPY $src)
->;
-
-
def : Pat <
(i16 (trunc i64:$src)),
(EXTRACT_SUBREG $src, sub0)
diff --git a/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp b/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
index 141e99b92cdc1..84f9373ae9144 100644
--- a/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
+++ b/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
@@ -41,6 +41,8 @@ STATISTIC(NumSDivs, "Number of sdiv converted to udiv");
STATISTIC(NumAShrs, "Number of ashr converted to lshr");
STATISTIC(NumSRems, "Number of srem converted to urem");
+static cl::opt<bool> DontProcessAdds("cvp-dont-process-adds", cl::init(true));
+
namespace {
class CorrelatedValuePropagation : public FunctionPass {
public:
@@ -405,6 +407,9 @@ static bool processAShr(BinaryOperator *SDI, LazyValueInfo *LVI) {
static bool processAdd(BinaryOperator *AddOp, LazyValueInfo *LVI) {
typedef OverflowingBinaryOperator OBO;
+ if (DontProcessAdds)
+ return false;
+
if (AddOp->getType()->isVectorTy() || hasLocalDefs(AddOp))
return false;
diff --git a/lib/Transforms/Scalar/Reassociate.cpp b/lib/Transforms/Scalar/Reassociate.cpp
index 181a324861e7e..65c814d7a63b8 100644
--- a/lib/Transforms/Scalar/Reassociate.cpp
+++ b/lib/Transforms/Scalar/Reassociate.cpp
@@ -1521,8 +1521,8 @@ Value *ReassociatePass::OptimizeAdd(Instruction *I,
if (ConstantInt *CI = dyn_cast<ConstantInt>(Factor)) {
if (CI->isNegative() && !CI->isMinValue(true)) {
Factor = ConstantInt::get(CI->getContext(), -CI->getValue());
- assert(!Duplicates.count(Factor) &&
- "Shouldn't have two constant factors, missed a canonicalize");
+ if (!Duplicates.insert(Factor).second)
+ continue;
unsigned Occ = ++FactorOccurrences[Factor];
if (Occ > MaxOcc) {
MaxOcc = Occ;
@@ -1534,8 +1534,8 @@ Value *ReassociatePass::OptimizeAdd(Instruction *I,
APFloat F(CF->getValueAPF());
F.changeSign();
Factor = ConstantFP::get(CF->getContext(), F);
- assert(!Duplicates.count(Factor) &&
- "Shouldn't have two constant factors, missed a canonicalize");
+ if (!Duplicates.insert(Factor).second)
+ continue;
unsigned Occ = ++FactorOccurrences[Factor];
if (Occ > MaxOcc) {
MaxOcc = Occ;
diff --git a/lib/Transforms/Vectorize/SLPVectorizer.cpp b/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 1c7cbc7edf9a3..328f270029604 100644
--- a/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -4026,40 +4026,36 @@ bool SLPVectorizerPass::tryToVectorize(BinaryOperator *V, BoUpSLP &R) {
if (!V)
return false;
- Value *P = V->getParent();
-
- // Vectorize in current basic block only.
- auto *Op0 = dyn_cast<Instruction>(V->getOperand(0));
- auto *Op1 = dyn_cast<Instruction>(V->getOperand(1));
- if (!Op0 || !Op1 || Op0->getParent() != P || Op1->getParent() != P)
- return false;
-
// Try to vectorize V.
- if (tryToVectorizePair(Op0, Op1, R))
+ if (tryToVectorizePair(V->getOperand(0), V->getOperand(1), R))
return true;
- auto *A = dyn_cast<BinaryOperator>(Op0);
- auto *B = dyn_cast<BinaryOperator>(Op1);
+ BinaryOperator *A = dyn_cast<BinaryOperator>(V->getOperand(0));
+ BinaryOperator *B = dyn_cast<BinaryOperator>(V->getOperand(1));
// Try to skip B.
if (B && B->hasOneUse()) {
- auto *B0 = dyn_cast<BinaryOperator>(B->getOperand(0));
- auto *B1 = dyn_cast<BinaryOperator>(B->getOperand(1));
- if (B0 && B0->getParent() == P && tryToVectorizePair(A, B0, R))
+ BinaryOperator *B0 = dyn_cast<BinaryOperator>(B->getOperand(0));
+ BinaryOperator *B1 = dyn_cast<BinaryOperator>(B->getOperand(1));
+ if (tryToVectorizePair(A, B0, R)) {
return true;
- if (B1 && B1->getParent() == P && tryToVectorizePair(A, B1, R))
+ }
+ if (tryToVectorizePair(A, B1, R)) {
return true;
+ }
}
// Try to skip A.
if (A && A->hasOneUse()) {
- auto *A0 = dyn_cast<BinaryOperator>(A->getOperand(0));
- auto *A1 = dyn_cast<BinaryOperator>(A->getOperand(1));
- if (A0 && A0->getParent() == P && tryToVectorizePair(A0, B, R))
+ BinaryOperator *A0 = dyn_cast<BinaryOperator>(A->getOperand(0));
+ BinaryOperator *A1 = dyn_cast<BinaryOperator>(A->getOperand(1));
+ if (tryToVectorizePair(A0, B, R)) {
return true;
- if (A1 && A1->getParent() == P && tryToVectorizePair(A1, B, R))
+ }
+ if (tryToVectorizePair(A1, B, R)) {
return true;
+ }
}
- return false;
+ return 0;
}
/// \brief Generate a shuffle mask to be used in a reduction tree.
@@ -4511,143 +4507,29 @@ static Value *getReductionValue(const DominatorTree *DT, PHINode *P,
return nullptr;
}
-namespace {
-/// Tracks instructons and its children.
-class WeakVHWithLevel final : public CallbackVH {
- /// Operand index of the instruction currently beeing analized.
- unsigned Level = 0;
- /// Is this the instruction that should be vectorized, or are we now
- /// processing children (i.e. operands of this instruction) for potential
- /// vectorization?
- bool IsInitial = true;
-
-public:
- explicit WeakVHWithLevel() = default;
- WeakVHWithLevel(Value *V) : CallbackVH(V){};
- /// Restart children analysis each time it is repaced by the new instruction.
- void allUsesReplacedWith(Value *New) override {
- setValPtr(New);
- Level = 0;
- IsInitial = true;
- }
- /// Check if the instruction was not deleted during vectorization.
- bool isValid() const { return !getValPtr(); }
- /// Is the istruction itself must be vectorized?
- bool isInitial() const { return IsInitial; }
- /// Try to vectorize children.
- void clearInitial() { IsInitial = false; }
- /// Are all children processed already?
- bool isFinal() const {
- assert(getValPtr() &&
- (isa<Instruction>(getValPtr()) &&
- cast<Instruction>(getValPtr())->getNumOperands() >= Level));
- return getValPtr() &&
- cast<Instruction>(getValPtr())->getNumOperands() == Level;
- }
- /// Get next child operation.
- Value *nextOperand() {
- assert(getValPtr() && isa<Instruction>(getValPtr()) &&
- cast<Instruction>(getValPtr())->getNumOperands() > Level);
- return cast<Instruction>(getValPtr())->getOperand(Level++);
- }
- virtual ~WeakVHWithLevel() = default;
-};
-} // namespace
-
/// \brief Attempt to reduce a horizontal reduction.
/// If it is legal to match a horizontal reduction feeding
-/// the phi node P with reduction operators Root in a basic block BB, then check
-/// if it can be done.
+/// the phi node P with reduction operators BI, then check if it
+/// can be done.
/// \returns true if a horizontal reduction was matched and reduced.
/// \returns false if a horizontal reduction was not matched.
-static bool canBeVectorized(
- PHINode *P, Instruction *Root, BasicBlock *BB, BoUpSLP &R,
- TargetTransformInfo *TTI,
- const function_ref<bool(BinaryOperator *, BoUpSLP &)> Vectorize) {
+static bool canMatchHorizontalReduction(PHINode *P, BinaryOperator *BI,
+ BoUpSLP &R, TargetTransformInfo *TTI,
+ unsigned MinRegSize) {
if (!ShouldVectorizeHor)
return false;
- if (!Root)
- return false;
-
- if (Root->getParent() != BB)
+ HorizontalReduction HorRdx(MinRegSize);
+ if (!HorRdx.matchAssociativeReduction(P, BI))
return false;
- SmallVector<WeakVHWithLevel, 8> Stack(1, Root);
- SmallSet<Value *, 8> VisitedInstrs;
- bool Res = false;
- while (!Stack.empty()) {
- Value *V = Stack.back();
- if (!V) {
- Stack.pop_back();
- continue;
- }
- auto *Inst = dyn_cast<Instruction>(V);
- if (!Inst || isa<PHINode>(Inst)) {
- Stack.pop_back();
- continue;
- }
- if (Stack.back().isInitial()) {
- Stack.back().clearInitial();
- if (auto *BI = dyn_cast<BinaryOperator>(Inst)) {
- HorizontalReduction HorRdx(R.getMinVecRegSize());
- if (HorRdx.matchAssociativeReduction(P, BI)) {
- // If there is a sufficient number of reduction values, reduce
- // to a nearby power-of-2. Can safely generate oversized
- // vectors and rely on the backend to split them to legal sizes.
- HorRdx.ReduxWidth =
- std::max((uint64_t)4, PowerOf2Floor(HorRdx.numReductionValues()));
-
- if (HorRdx.tryToReduce(R, TTI)) {
- Res = true;
- P = nullptr;
- continue;
- }
- }
- if (P) {
- Inst = dyn_cast<Instruction>(BI->getOperand(0));
- if (Inst == P)
- Inst = dyn_cast<Instruction>(BI->getOperand(1));
- if (!Inst) {
- P = nullptr;
- continue;
- }
- }
- }
- P = nullptr;
- if (Vectorize(dyn_cast<BinaryOperator>(Inst), R)) {
- Res = true;
- continue;
- }
- }
- if (Stack.back().isFinal()) {
- Stack.pop_back();
- continue;
- }
- if (auto *NextV = dyn_cast<Instruction>(Stack.back().nextOperand()))
- if (NextV->getParent() == BB && VisitedInstrs.insert(NextV).second &&
- Stack.size() < RecursionMaxDepth)
- Stack.push_back(NextV);
- }
- return Res;
-}
-
-bool SLPVectorizerPass::vectorizeRootInstruction(PHINode *P, Value *V,
- BasicBlock *BB, BoUpSLP &R,
- TargetTransformInfo *TTI) {
- if (!V)
- return false;
- auto *I = dyn_cast<Instruction>(V);
- if (!I)
- return false;
+ // If there is a sufficient number of reduction values, reduce
+ // to a nearby power-of-2. Can safely generate oversized
+ // vectors and rely on the backend to split them to legal sizes.
+ HorRdx.ReduxWidth =
+ std::max((uint64_t)4, PowerOf2Floor(HorRdx.numReductionValues()));
- if (!isa<BinaryOperator>(I))
- P = nullptr;
- // Try to match and vectorize a horizontal reduction.
- return canBeVectorized(P, I, BB, R, TTI,
- [this](BinaryOperator *BI, BoUpSLP &R) -> bool {
- return tryToVectorize(BI, R);
- });
+ return HorRdx.tryToReduce(R, TTI);
}
bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
@@ -4717,42 +4599,67 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
if (P->getNumIncomingValues() != 2)
return Changed;
+ Value *Rdx = getReductionValue(DT, P, BB, LI);
+
+ // Check if this is a Binary Operator.
+ BinaryOperator *BI = dyn_cast_or_null<BinaryOperator>(Rdx);
+ if (!BI)
+ continue;
+
// Try to match and vectorize a horizontal reduction.
- if (vectorizeRootInstruction(P, getReductionValue(DT, P, BB, LI), BB, R,
- TTI)) {
+ if (canMatchHorizontalReduction(P, BI, R, TTI, R.getMinVecRegSize())) {
Changed = true;
it = BB->begin();
e = BB->end();
continue;
}
+
+ Value *Inst = BI->getOperand(0);
+ if (Inst == P)
+ Inst = BI->getOperand(1);
+
+ if (tryToVectorize(dyn_cast<BinaryOperator>(Inst), R)) {
+ // We would like to start over since some instructions are deleted
+ // and the iterator may become invalid value.
+ Changed = true;
+ it = BB->begin();
+ e = BB->end();
+ continue;
+ }
+
continue;
}
- if (ShouldStartVectorizeHorAtStore) {
- if (StoreInst *SI = dyn_cast<StoreInst>(it)) {
- // Try to match and vectorize a horizontal reduction.
- if (vectorizeRootInstruction(nullptr, SI->getValueOperand(), BB, R,
- TTI)) {
- Changed = true;
- it = BB->begin();
- e = BB->end();
- continue;
+ if (ShouldStartVectorizeHorAtStore)
+ if (StoreInst *SI = dyn_cast<StoreInst>(it))
+ if (BinaryOperator *BinOp =
+ dyn_cast<BinaryOperator>(SI->getValueOperand())) {
+ if (canMatchHorizontalReduction(nullptr, BinOp, R, TTI,
+ R.getMinVecRegSize()) ||
+ tryToVectorize(BinOp, R)) {
+ Changed = true;
+ it = BB->begin();
+ e = BB->end();
+ continue;
+ }
}
- }
- }
// Try to vectorize horizontal reductions feeding into a return.
- if (ReturnInst *RI = dyn_cast<ReturnInst>(it)) {
- if (RI->getNumOperands() != 0) {
- // Try to match and vectorize a horizontal reduction.
- if (vectorizeRootInstruction(nullptr, RI->getOperand(0), BB, R, TTI)) {
- Changed = true;
- it = BB->begin();
- e = BB->end();
- continue;
+ if (ReturnInst *RI = dyn_cast<ReturnInst>(it))
+ if (RI->getNumOperands() != 0)
+ if (BinaryOperator *BinOp =
+ dyn_cast<BinaryOperator>(RI->getOperand(0))) {
+ DEBUG(dbgs() << "SLP: Found a return to vectorize.\n");
+ if (canMatchHorizontalReduction(nullptr, BinOp, R, TTI,
+ R.getMinVecRegSize()) ||
+ tryToVectorizePair(BinOp->getOperand(0), BinOp->getOperand(1),
+ R)) {
+ Changed = true;
+ it = BB->begin();
+ e = BB->end();
+ continue;
+ }
}
- }
- }
// Try to vectorize trees that start at compare instructions.
if (CmpInst *CI = dyn_cast<CmpInst>(it)) {
@@ -4765,14 +4672,16 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
continue;
}
- for (int I = 0; I < 2; ++I) {
- if (vectorizeRootInstruction(nullptr, CI->getOperand(I), BB, R, TTI)) {
- Changed = true;
- // We would like to start over since some instructions are deleted
- // and the iterator may become invalid value.
- it = BB->begin();
- e = BB->end();
- break;
+ for (int i = 0; i < 2; ++i) {
+ if (BinaryOperator *BI = dyn_cast<BinaryOperator>(CI->getOperand(i))) {
+ if (tryToVectorizePair(BI->getOperand(0), BI->getOperand(1), R)) {
+ Changed = true;
+ // We would like to start over since some instructions are deleted
+ // and the iterator may become invalid value.
+ it = BB->begin();
+ e = BB->end();
+ break;
+ }
}
}
continue;
diff --git a/test/CodeGen/AMDGPU/trunc.ll b/test/CodeGen/AMDGPU/trunc.ll
index 63bb447df2fde..2c2ce4c5d3518 100644
--- a/test/CodeGen/AMDGPU/trunc.ll
+++ b/test/CodeGen/AMDGPU/trunc.ll
@@ -1,13 +1,15 @@
-; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck -check-prefix=SI %s
+; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck -check-prefix=GCN -check-prefix=SI %s
+; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs< %s | FileCheck -check-prefix=GCN -check-prefix=VI %s
; RUN: llc -march=r600 -mcpu=cypress < %s | FileCheck -check-prefix=EG %s
declare i32 @llvm.r600.read.tidig.x() nounwind readnone
define void @trunc_i64_to_i32_store(i32 addrspace(1)* %out, i64 %in) {
-; SI-LABEL: {{^}}trunc_i64_to_i32_store:
-; SI: s_load_dword [[SLOAD:s[0-9]+]], s[0:1], 0xb
-; SI: v_mov_b32_e32 [[VLOAD:v[0-9]+]], [[SLOAD]]
+; GCN-LABEL: {{^}}trunc_i64_to_i32_store:
+; GCN: s_load_dword [[SLOAD:s[0-9]+]], s[0:1],
+; GCN: v_mov_b32_e32 [[VLOAD:v[0-9]+]], [[SLOAD]]
; SI: buffer_store_dword [[VLOAD]]
+; VI: flat_store_dword v[{{[0-9:]+}}], [[VLOAD]]
; EG-LABEL: {{^}}trunc_i64_to_i32_store:
; EG: MEM_RAT_CACHELESS STORE_RAW T0.X, T1.X, 1
@@ -18,12 +20,14 @@ define void @trunc_i64_to_i32_store(i32 addrspace(1)* %out, i64 %in) {
ret void
}
-; SI-LABEL: {{^}}trunc_load_shl_i64:
-; SI-DAG: s_load_dwordx2
-; SI-DAG: s_load_dword [[SREG:s[0-9]+]],
-; SI: s_lshl_b32 [[SHL:s[0-9]+]], [[SREG]], 2
-; SI: v_mov_b32_e32 [[VSHL:v[0-9]+]], [[SHL]]
-; SI: buffer_store_dword [[VSHL]],
+; GCN-LABEL: {{^}}trunc_load_shl_i64:
+; GCN-DAG: s_load_dwordx2
+; GCN-DAG: s_load_dword [[SREG:s[0-9]+]],
+; GCN: s_lshl_b32 [[SHL:s[0-9]+]], [[SREG]], 2
+; GCN: v_mov_b32_e32 [[VSHL:v[0-9]+]], [[SHL]]
+; SI: buffer_store_dword [[VSHL]]
+; VI: flat_store_dword v[{{[0-9:]+}}], [[VSHL]]
+
define void @trunc_load_shl_i64(i32 addrspace(1)* %out, i64 %a) {
%b = shl i64 %a, 2
%result = trunc i64 %b to i32
@@ -31,15 +35,17 @@ define void @trunc_load_shl_i64(i32 addrspace(1)* %out, i64 %a) {
ret void
}
-; SI-LABEL: {{^}}trunc_shl_i64:
+; GCN-LABEL: {{^}}trunc_shl_i64:
; SI: s_load_dwordx2 s{{\[}}[[LO_SREG:[0-9]+]]:{{[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xd
-; SI: s_lshl_b64 s{{\[}}[[LO_SHL:[0-9]+]]:{{[0-9]+\]}}, s{{\[}}[[LO_SREG]]:{{[0-9]+\]}}, 2
-; SI: s_add_u32 s[[LO_SREG2:[0-9]+]], s[[LO_SHL]],
-; SI: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], s[[LO_SREG2]]
-; SI: s_addc_u32
+; VI: s_load_dwordx2 s{{\[}}[[LO_SREG:[0-9]+]]:{{[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0x34
+; GCN: s_lshl_b64 s{{\[}}[[LO_SHL:[0-9]+]]:{{[0-9]+\]}}, s{{\[}}[[LO_SREG]]:{{[0-9]+\]}}, 2
+; GCN: s_add_u32 s[[LO_SREG2:[0-9]+]], s[[LO_SHL]],
+; GCN: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], s[[LO_SREG2]]
+; GCN: s_addc_u32
; SI: buffer_store_dword v[[LO_VREG]],
-; SI: v_mov_b32_e32
-; SI: v_mov_b32_e32
+; VI: flat_store_dword v[{{[0-9:]+}}], v[[LO_VREG]]
+; GCN: v_mov_b32_e32
+; GCN: v_mov_b32_e32
define void @trunc_shl_i64(i64 addrspace(1)* %out2, i32 addrspace(1)* %out, i64 %a) {
%aa = add i64 %a, 234 ; Prevent shrinking store.
%b = shl i64 %aa, 2
@@ -49,9 +55,9 @@ define void @trunc_shl_i64(i64 addrspace(1)* %out2, i32 addrspace(1)* %out, i64
ret void
}
-; SI-LABEL: {{^}}trunc_i32_to_i1:
-; SI: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}}
-; SI: v_cmp_eq_u32
+; GCN-LABEL: {{^}}trunc_i32_to_i1:
+; GCN: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}}
+; GCN: v_cmp_eq_u32
define void @trunc_i32_to_i1(i32 addrspace(1)* %out, i32 addrspace(1)* %ptr) {
%a = load i32, i32 addrspace(1)* %ptr, align 4
%trunc = trunc i32 %a to i1
@@ -60,9 +66,30 @@ define void @trunc_i32_to_i1(i32 addrspace(1)* %out, i32 addrspace(1)* %ptr) {
ret void
}
-; SI-LABEL: {{^}}sgpr_trunc_i32_to_i1:
-; SI: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}}
-; SI: v_cmp_eq_u32
+; GCN-LABEL: {{^}}trunc_i8_to_i1:
+; GCN: v_and_b32_e32 v{{[0-9]+}}, 1, v{{[0-9]+}}
+; GCN: v_cmp_eq_u32
+define void @trunc_i8_to_i1(i8 addrspace(1)* %out, i8 addrspace(1)* %ptr) {
+ %a = load i8, i8 addrspace(1)* %ptr, align 4
+ %trunc = trunc i8 %a to i1
+ %result = select i1 %trunc, i8 1, i8 0
+ store i8 %result, i8 addrspace(1)* %out, align 4
+ ret void
+}
+
+; GCN-LABEL: {{^}}sgpr_trunc_i16_to_i1:
+; GCN: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}}
+; GCN: v_cmp_eq_u32
+define void @sgpr_trunc_i16_to_i1(i16 addrspace(1)* %out, i16 %a) {
+ %trunc = trunc i16 %a to i1
+ %result = select i1 %trunc, i16 1, i16 0
+ store i16 %result, i16 addrspace(1)* %out, align 4
+ ret void
+}
+
+; GCN-LABEL: {{^}}sgpr_trunc_i32_to_i1:
+; GCN: s_and_b32 s{{[0-9]+}}, 1, s{{[0-9]+}}
+; GCN: v_cmp_eq_u32
define void @sgpr_trunc_i32_to_i1(i32 addrspace(1)* %out, i32 %a) {
%trunc = trunc i32 %a to i1
%result = select i1 %trunc, i32 1, i32 0
@@ -70,11 +97,12 @@ define void @sgpr_trunc_i32_to_i1(i32 addrspace(1)* %out, i32 %a) {
ret void
}
-; SI-LABEL: {{^}}s_trunc_i64_to_i1:
+; GCN-LABEL: {{^}}s_trunc_i64_to_i1:
; SI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0xb
-; SI: s_and_b32 [[MASKED:s[0-9]+]], 1, s[[SLO]]
-; SI: v_cmp_eq_u32_e64 s{{\[}}[[VLO:[0-9]+]]:[[VHI:[0-9]+]]], [[MASKED]], 1{{$}}
-; SI: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, s{{\[}}[[VLO]]:[[VHI]]]
+; VI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0x2c
+; GCN: s_and_b32 [[MASKED:s[0-9]+]], 1, s[[SLO]]
+; GCN: v_cmp_eq_u32_e64 s{{\[}}[[VLO:[0-9]+]]:[[VHI:[0-9]+]]], [[MASKED]], 1{{$}}
+; GCN: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, s{{\[}}[[VLO]]:[[VHI]]]
define void @s_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 %x) {
%trunc = trunc i64 %x to i1
%sel = select i1 %trunc, i32 63, i32 -12
@@ -82,11 +110,12 @@ define void @s_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 %x) {
ret void
}
-; SI-LABEL: {{^}}v_trunc_i64_to_i1:
+; GCN-LABEL: {{^}}v_trunc_i64_to_i1:
; SI: buffer_load_dwordx2 v{{\[}}[[VLO:[0-9]+]]:{{[0-9]+\]}}
-; SI: v_and_b32_e32 [[MASKED:v[0-9]+]], 1, v[[VLO]]
-; SI: v_cmp_eq_u32_e32 vcc, 1, [[MASKED]]
-; SI: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, vcc
+; VI: flat_load_dwordx2 v{{\[}}[[VLO:[0-9]+]]:{{[0-9]+\]}}
+; GCN: v_and_b32_e32 [[MASKED:v[0-9]+]], 1, v[[VLO]]
+; GCN: v_cmp_eq_u32_e32 vcc, 1, [[MASKED]]
+; GCN: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, vcc
define void @v_trunc_i64_to_i1(i32 addrspace(1)* %out, i64 addrspace(1)* %in) {
%tid = call i32 @llvm.r600.read.tidig.x() nounwind readnone
%gep = getelementptr i64, i64 addrspace(1)* %in, i32 %tid
diff --git a/test/Transforms/CorrelatedValuePropagation/add.ll b/test/Transforms/CorrelatedValuePropagation/add.ll
index 4b436ff9a4014..0ba521c894e2f 100644
--- a/test/Transforms/CorrelatedValuePropagation/add.ll
+++ b/test/Transforms/CorrelatedValuePropagation/add.ll
@@ -1,4 +1,4 @@
-; RUN: opt < %s -correlated-propagation -S | FileCheck %s
+; RUN: opt < %s -correlated-propagation -cvp-dont-process-adds=false -S | FileCheck %s
; CHECK-LABEL: @test0(
define void @test0(i32 %a) {
diff --git a/test/Transforms/Reassociate/basictest.ll b/test/Transforms/Reassociate/basictest.ll
index 11c67bea2cb0a..4703fd7621b63 100644
--- a/test/Transforms/Reassociate/basictest.ll
+++ b/test/Transforms/Reassociate/basictest.ll
@@ -222,3 +222,23 @@ define i32 @test15(i32 %X1, i32 %X2, i32 %X3) {
; CHECK-LABEL: @test15
; CHECK: and i1 %A, %B
}
+
+; PR30256 - previously this asserted.
+; CHECK-LABEL: @test16
+; CHECK: %[[FACTOR:.*]] = mul i64 %a, -4
+; CHECK-NEXT: %[[RES:.*]] = add i64 %[[FACTOR]], %b
+; CHECK-NEXT: ret i64 %[[RES]]
+define i64 @test16(i1 %cmp, i64 %a, i64 %b) {
+entry:
+ %shl = shl i64 %a, 1
+ %shl.neg = sub i64 0, %shl
+ br i1 %cmp, label %if.then, label %if.end
+
+if.then: ; preds = %entry
+ %add1 = add i64 %shl.neg, %shl.neg
+ %add2 = add i64 %add1, %b
+ ret i64 %add2
+
+if.end: ; preds = %entry
+ ret i64 0
+}
diff --git a/test/Transforms/SLPVectorizer/X86/horizontal-list.ll b/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
index 07064304bf019..3f9fffb9b6248 100644
--- a/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
+++ b/test/Transforms/SLPVectorizer/X86/horizontal-list.ll
@@ -12,25 +12,26 @@ define float @baz() {
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
-; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr to <2 x float>*), align 16
-; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr1 to <2 x float>*), align 16
-; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]
-; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
-; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP4]], [[CONV]]
-; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
-; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[TMP5]], [[ADD]]
-; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
-; CHECK-NEXT: [[TMP7:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
-; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]
-; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
-; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP9]], [[ADD_1]]
-; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
-; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP10]], [[ADD_2]]
+; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
+; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
+; CHECK-NEXT: [[MUL4:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
+; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[MUL4]], [[CONV]]
+; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
+; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
+; CHECK-NEXT: [[MUL4_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
+; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD]]
+; CHECK-NEXT: [[TMP5:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
+; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
+; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x float> [[TMP6]], [[TMP5]]
+; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0
+; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP8]], [[ADD_1]]
+; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP7]], i32 1
+; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP9]], [[ADD_2]]
; CHECK-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV]]
-; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[TMP4]], [[ADD7]]
-; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[TMP5]], [[ADD19]]
-; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP9]], [[ADD19_1]]
-; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP10]], [[ADD19_2]]
+; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[MUL4]], [[ADD7]]
+; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD19]]
+; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP8]], [[ADD19_1]]
+; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP9]], [[ADD19_2]]
; CHECK-NEXT: store float [[ADD19_3]], float* @res, align 4
; CHECK-NEXT: ret float [[ADD19_3]]
;
@@ -69,37 +70,40 @@ define float @bazz() {
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
-; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr to <2 x float>*), align 16
-; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, <2 x float>* bitcast ([20 x float]* @arr1 to <2 x float>*), align 16
-; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]
-; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
-; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP4]], [[CONV]]
-; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
-; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[TMP5]], [[ADD]]
-; CHECK-NEXT: [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8
-; CHECK-NEXT: [[TMP7:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8
-; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]
-; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
-; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP9]], [[ADD_1]]
-; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
-; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[TMP10]], [[ADD_2]]
+; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
+; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
+; CHECK-NEXT: [[MUL4:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
+; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[MUL4]], [[CONV]]
+; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
+; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
+; CHECK-NEXT: [[MUL4_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
+; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[MUL4_1]], [[ADD]]
+; CHECK-NEXT: [[TMP5:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
+; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
+; CHECK-NEXT: [[MUL4_2:%.*]] = fmul fast float [[TMP6]], [[TMP5]]
+; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[MUL4_2]], [[ADD_1]]
+; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
+; CHECK-NEXT: [[TMP8:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
+; CHECK-NEXT: [[MUL4_3:%.*]] = fmul fast float [[TMP8]], [[TMP7]]
+; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float [[MUL4_3]], [[ADD_2]]
; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
; CHECK-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV6]]
-; CHECK-NEXT: [[TMP11:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 4) to <2 x float>*), align 16
-; CHECK-NEXT: [[TMP12:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 4) to <2 x float>*), align 16
-; CHECK-NEXT: [[TMP13:%.*]] = fmul fast <2 x float> [[TMP12]], [[TMP11]]
-; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x float> [[TMP13]], i32 0
-; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[TMP14]], [[ADD7]]
-; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP13]], i32 1
-; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[TMP15]], [[ADD19]]
-; CHECK-NEXT: [[TMP16:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 6) to <2 x float>*), align 8
-; CHECK-NEXT: [[TMP17:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 6) to <2 x float>*), align 8
-; CHECK-NEXT: [[TMP18:%.*]] = fmul fast <2 x float> [[TMP17]], [[TMP16]]
-; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x float> [[TMP18]], i32 0
-; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP19]], [[ADD19_1]]
-; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x float> [[TMP18]], i32 1
-; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP20]], [[ADD19_2]]
+; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 4), align 16
+; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 4), align 16
+; CHECK-NEXT: [[MUL18:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
+; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float [[MUL18]], [[ADD7]]
+; CHECK-NEXT: [[TMP11:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 5), align 4
+; CHECK-NEXT: [[TMP12:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 5), align 4
+; CHECK-NEXT: [[MUL18_1:%.*]] = fmul fast float [[TMP12]], [[TMP11]]
+; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float [[MUL18_1]], [[ADD19]]
+; CHECK-NEXT: [[TMP13:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 6) to <2 x float>*), align 8
+; CHECK-NEXT: [[TMP14:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 6) to <2 x float>*), align 8
+; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x float> [[TMP14]], [[TMP13]]
+; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP15]], i32 0
+; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float [[TMP16]], [[ADD19_1]]
+; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP15]], i32 1
+; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float [[TMP17]], [[ADD19_2]]
; CHECK-NEXT: store float [[ADD19_3]], float* @res, align 4
; CHECK-NEXT: ret float [[ADD19_3]]
;
@@ -151,20 +155,24 @@ define float @bazzz() {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
-; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr to <4 x float>*), align 16
-; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
-; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
-; CHECK-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef
-; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]]
-; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
-; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
-; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
-; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
-; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
-; CHECK-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]]
-; CHECK-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]]
-; CHECK-NEXT: store float [[TMP8]], float* @res, align 4
-; CHECK-NEXT: ret float [[TMP8]]
+; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
+; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
+; CHECK-NEXT: [[MUL:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
+; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
+; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
+; CHECK-NEXT: [[MUL_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
+; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float [[MUL_1]], [[MUL]]
+; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
+; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
+; CHECK-NEXT: [[MUL_2:%.*]] = fmul fast float [[TMP7]], [[TMP6]]
+; CHECK-NEXT: [[TMP8:%.*]] = fadd fast float [[MUL_2]], [[TMP5]]
+; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
+; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
+; CHECK-NEXT: [[MUL_3:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
+; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[MUL_3]], [[TMP8]]
+; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[CONV]], [[TMP11]]
+; CHECK-NEXT: store float [[TMP12]], float* @res, align 4
+; CHECK-NEXT: ret float [[TMP12]]
;
entry:
%0 = load i32, i32* @n, align 4
@@ -194,19 +202,23 @@ define i32 @foo() {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* @n, align 4
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
-; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr to <4 x float>*), align 16
-; CHECK-NEXT: [[TMP2:%.*]] = load <4 x float>, <4 x float>* bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
-; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
-; CHECK-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef
-; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]]
-; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
-; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
-; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
-; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
-; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
-; CHECK-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]]
-; CHECK-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]]
-; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP8]] to i32
+; CHECK-NEXT: [[TMP1:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
+; CHECK-NEXT: [[TMP2:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
+; CHECK-NEXT: [[MUL:%.*]] = fmul fast float [[TMP2]], [[TMP1]]
+; CHECK-NEXT: [[TMP3:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
+; CHECK-NEXT: [[TMP4:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
+; CHECK-NEXT: [[MUL_1:%.*]] = fmul fast float [[TMP4]], [[TMP3]]
+; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float [[MUL_1]], [[MUL]]
+; CHECK-NEXT: [[TMP6:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8
+; CHECK-NEXT: [[TMP7:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8
+; CHECK-NEXT: [[MUL_2:%.*]] = fmul fast float [[TMP7]], [[TMP6]]
+; CHECK-NEXT: [[TMP8:%.*]] = fadd fast float [[MUL_2]], [[TMP5]]
+; CHECK-NEXT: [[TMP9:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4
+; CHECK-NEXT: [[TMP10:%.*]] = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4
+; CHECK-NEXT: [[MUL_3:%.*]] = fmul fast float [[TMP10]], [[TMP9]]
+; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[MUL_3]], [[TMP8]]
+; CHECK-NEXT: [[TMP12:%.*]] = fmul fast float [[CONV]], [[TMP11]]
+; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP12]] to i32
; CHECK-NEXT: store i32 [[CONV4]], i32* @n, align 4
; CHECK-NEXT: ret i32 [[CONV4]]
;