For below C code, #include <arm_neon.h> uint16x8_t test1(uint16x8_t a, uint16x8_t b, uint32_t c) { return vmlaq_n_u16(a, b, vqrshrns_n_u32(c, 11)); } Clang generates below error message: LLVM ERROR: Cannot select: 0x33deb60: v8i16 = AArch64ISD::NEON_VDUP 0x33dd840 [ORD=4] [ID=7] 0x33dd840: i64 = TargetConstant<127> [ID=3] In function: test1 For another case below, #include <arm_neon.h> uint16x8_t test2(uint16x8_t a, uint16x8_t b, uint32_t c) { return vsubl_high_u16(a, vmlaq_n_u16(a, b, vqrshrns_n_u32(c, 11))); } Clang would hit an assert: /include/llvm/CodeGen/SelectionDAGNodes.h:559: const llvm::SDValue& llvm::SDNode::getOperand(unsigned int) const: Assertion `Num < NumOperands && "Invalid child # of SDNode!"' failed. If r203229 is reverted, all of them can pass.
This is a pair of bugs in AArch64's LowerCONCAT_VECTORS, exposed by slightly different incoming code. 1. The NEON_VDUP optimisation was far too aggressive, assuming (I think) that the input would always be BUILD_VECTOR. In this case it was interpreting the intrinsic ID as the number being duplicated. 2. We were treating most unknown concats as legal (by returning Op rather than SDValue). I think only concats of pairs of vectors are actually legal. I've committed a patch fixes it as r203450, but it looks like the code could be improved by some combining of CONCAT_VECTORS and BUILD_VECTOR handling in these weird cases (the CONCAT is essentially a BUILD, just with a pseudovector input rather than a real vector).