Clang-cl from the release_90 branch crashes while compiling Firefox for aarch64-windows. Unfortunately I couldn't reproduce this locally, it only crashes with Mozilla automation's clang builds, which don't publish debug symbols. By tracing execution side-by-side with the crashy build and my own build that has symbols, I narrowed this down to an unhandled switch case in `GetNeonType`. The `TypeFlags.Flags` are -1. Reduced buffer.c: foo() { _InterlockedExchangeAdd64(0, -1); } Command line from the reproducer script: "clang-cl.exe" "-cc1" "-triple" "aarch64-unknown-windows-msvc19.16.27026" "-emit-obj" "-mincremental-linker-compatible" "-disable-free" "-disable-llvm-verifier" "-discard-value-names" "-main-file-name" "buffer.c" "-mrelocation-model" "static" "-mthread-model" "posix" "-mdisable-fp-elim" "-relaxed-aliasing" "-fmath-errno" "-masm-verbose" "-mconstructor-aliases" "-munwind-tables" "-target-cpu" "generic" "-target-feature" "+neon" "-target-abi" "aapcs" "-fallow-half-arguments-and-returns" "-D_MT" "-D_DLL" "--dependent-lib=msvcrt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fdiagnostics-format" "msvc" "-gcodeview" "-debug-info-kind=limited" "-ffunction-sections" "-fdata-sections" "-coverage-notes-file" "z:\\build\\build\\src\\obj-firefox\\media\\ffvpx\\libavutil\\buffer.gcno" "-D" "DEBUG=1" "-D" "_USE_MATH_DEFINES" "-D" "inline=__inline" "-D" "HAVE_AV_CONFIG_H" "-D" "ASSERT_LEVEL=2" "-D" "MOZILLA_CLIENT" "-D" "_HAS_EXCEPTIONS=0" "-O2" "-Wall" "-Wno-unknown-pragmas" "-Wno-ignored-pragmas" "-Wno-deprecated-declarations" "-Wno-invalid-noreturn" "-Wno-parentheses" "-Wno-pointer-sign" "-Wno-sign-compare" "-Wno-switch" "-Wno-type-limits" "-Wno-unused-function" "-Wno-deprecated-declarations" "-Wno-absolute-value" "-Wno-incompatible-pointer-types" "-Wno-string-conversion" "-Wno-visibility" "-Wno-inconsistent-dllimport" "-Wno-macro-redefined" "-ferror-limit" "19" "-fmessage-length" "0" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.16.27026" "-fdelayed-template-parsing" "-fobjc-runtime=gcc" "-fdiagnostics-show-option" "-vectorize-loops" "-vectorize-slp" "-std=gnu99" "-faddrsig" "-x" "c" "buffer.c" Stack (the top frame is actually GetNeonType, the symbols are missing): 00 clang_cl!clang::CodeGen::CodeGenFunction::EmitCommonNeonBuiltinExpr 01 clang_cl!clang::CodeGen::CodeGenFunction::EmitAArch64BuiltinExpr 02 clang_cl!clang::CodeGen::CodeGenFunction::EmitBuiltinExpr 03 clang_cl!clang::CodeGen::CodeGenFunction::EmitCallExpr 04 clang_cl!clang::CodeGen::CodeGenFunction::EmitCheckedInBoundsGEP 05 clang_cl!clang::CodeGen::CodeGenFunction::EmitScalarExpr 06 clang_cl!clang::CodeGen::CodeGenFunction::EmitAnyExpr 07 clang_cl!clang::CodeGen::CodeGenFunction::EmitIgnoredExpr 08 clang_cl!clang::CodeGen::CodeGenFunction::EmitStmt 09 clang_cl!clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope 0a clang_cl!clang::CodeGen::CodeGenFunction::EmitFunctionBody 0b clang_cl!clang::CodeGen::CodeGenFunction::GenerateCode 0c clang_cl!clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition 0d clang_cl!clang::CodeGen::CodeGenModule::EmitGlobalDefinition 0e clang_cl!clang::CodeGen::CodeGenModule::EmitGlobal 0f clang_cl!clang::CodeGen::CodeGenModule::EmitTopLevelDecl 10 clang_cl!clang::CreateLLVMCodeGen 11 clang_cl!clang::BackendConsumer::HandleTopLevelDecl 12 clang_cl!clang::ParseAST 13 clang_cl!clang::FrontendAction::Execute 14 clang_cl!clang::CompilerInstance::ExecuteAction 15 clang_cl!clang::ExecuteCompilerInvocation 16 clang_cl!clang::ChainedDiagnosticConsumer::HandleDiagnostic 17 clang_cl 18 clang_cl!llvm::itanium_demangle::OutputStream::writeUnsigned 19 KERNEL32!BaseThreadInitThunk 1a ntdll!RtlUserThreadStart
> Unfortunately I couldn't reproduce this locally, it only crashes with > Mozilla automation's clang builds, which don't publish debug symbols. By > tracing execution side-by-side with the crashy build and my own build that > has symbols, I narrowed this down to an unhandled switch case in > `GetNeonType`. The `TypeFlags.Flags` are -1. Digging a bit further: in my local build (a single-stage build compiled with clang 8.0.0) the `TypeFlags.Flags` are also -1, but I end up taking an innocuous branch of the switch. In the 3-stage automation build of 9.0, the -1 value sends us through a jump table to a bogus address.
(In reply to David Major from comment #1) > > Unfortunately I couldn't reproduce this locally, it only crashes with > > Mozilla automation's clang builds, which don't publish debug symbols. By > > tracing execution side-by-side with the crashy build and my own build that > > has symbols, I narrowed this down to an unhandled switch case in > > `GetNeonType`. The `TypeFlags.Flags` are -1. > > Digging a bit further: in my local build (a single-stage build compiled with > clang 8.0.0) the `TypeFlags.Flags` are also -1, but I end up taking an > innocuous branch of the switch. In the 3-stage automation build of 9.0, the > -1 value sends us through a jump table to a bogus address. I'm guess it's because of http://llvm.org/r357067 But the underlying problem of having a case that's not covered by the switch in GetNeonType() is what needs to be fixed.
Excerpt from EmitAArch64BuiltinExpr: llvm::APSInt Result; const Expr *Arg = E->getArg(E->getNumArgs()-1); NeonTypeFlags Type(0); if (Arg->isIntegerConstantExpr(Result, getContext())) // Determine the type of this overloaded NEON intrinsic. Type = NeonTypeFlags(Result.getZExtValue()); This looks at the last argument in `_InterlockedExchangeAdd64(0, -1);` and puts that into `Type` which is what the switch will key off of in `GetNeonType`. Just to confirm my understanding I tried changing the reproducer to use -2, and that indeed became the new value of `Type`. This doesn't seem right, it doesn't make sense to treat an integer literal from user-provided source as a `class NeonTypeFlags`. (But I don't know what _would_ be the right thing here.)
Elsewhere in the source is another occurrence of that pattern, this time prefaced with the comment: // Get the last argument, which specifies the vector type. That's not true for `_InterlockedExchangeAdd64`.
So it looks like, the `_InterlockedExchangeAdd64` family doesn't even care about this `GetNeonType` business. It looks like they normally just hit the cases like `case AArch64::BI_InterlockedExchangeAdd64` which don't make use of `Type`. I wonder if this is just an ordering/logic issue in `EmitAArch64BuiltinExpr`? (It is certainly a tough function to follow along with.) Should such intrinsics just avoid calling `GetNeonType`?
I'm testing out a patch that does a bulk move of the `case AArch64::BI_...` region up into the earlier switch labeled `Handle non-overloaded intrinsics first.` -- in particular this would make those cases be handled before the NeonType stuff. But at this point it's going to go into next week. FWIW on [32-bit] ARM, the relative ordering of the code regions seems OK as-is.
https://reviews.llvm.org/rL367323 (I'll leave the bug open to track the merge)
(In reply to David Major from comment #7) > https://reviews.llvm.org/rL367323 > > (I'll leave the bug open to track the merge) Merged to release_90 in r367525. Thanks!