I discovered this while staring at Microsoft's platform adaption layer code in ChakraCore: https://github.com/Microsoft/ChakraCore/blob/master/lib/Common/CommonPal.h#L647 __forceinline unsigned char _BitTestAndSet(LONG *_BitBase, int _BitPos) { #if defined(__clang__) && !defined(_ARM_) // Clang doesn't expand _bittestandset intrinic to bts, and it's implemention also doesn't work for _BitPos >= 32 unsigned char retval = 0; asm( "bts %[_BitPos], %[_BitBase]\n\t" "setc %b[retval]\n\t" : [_BitBase] "+m" (*_BitBase), [retval] "+rm" (retval) : [_BitPos] "ri" (_BitPos) : "cc" // clobber condition code ); return retval; #else return _bittestandset(_BitBase, _BitPos); #endif } It's a shame that nobody filed this bug upstream. :( The Intel manual supports confirms this view: """ BT—Bit Test ... Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset (specified by the second operand) and stores the value of the bit in the CF flag. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value: • If the bit base operand specifies a register, the instruction takes the modulo 16, 32, or 64 of the bit offset operand (modulo size depends on the mode and register size; 64-bit operands are available only in 64-bit mode). • If the bit base operand specifies a memory location, the operand represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string. The range of the bit position that can be referenced by the offset operand depends on the operand size. See also: Bit(BitBase, BitOffset) on page 3-11. """ We either need to codegen this with an intrinsic or inline asm that will reliably select to bts, or we need to do an array indexing operation first.
There was a patch for this, but it got stuck: https://reviews.llvm.org/D33616
This should be fixed after r333978, r334059, and r334060.