Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll is flaky #50408

Open
nico opened this issue Jul 12, 2021 · 16 comments
Open

ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll is flaky #50408

nico opened this issue Jul 12, 2021 · 16 comments
Labels
bugzilla Issues migrated from bugzilla orcjit

Comments

@nico
Copy link
Contributor

nico commented Jul 12, 2021

Bugzilla Link 51064
Version trunk
OS Windows NT
CC @AlexDenisov,@lhames

Extended Description

I saw it fail twice recently, but didn't see it fail ever before. So this might be a new thing.

http://45.33.8.238/mac/33411/step_11.txt , at rev f192616, 2021 Jun 12

https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8842029280494205536/+/u/package_clang/stdout?format=raw , at rev d5c0b9c , 2021 Jun 11

Some tests will be skipped and the --timeout command line argument will not work.
-- Testing: 72841 tests, 12 workers --
Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60..
FAIL: LLVM :: ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll (49812 of 72841)
******************** TEST 'LLVM :: ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll' FAILED ********************
Script:

: 'RUN: at line 1'; /opt/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /opt/s/w/ir/cache/builder/src/third_party/llvm/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll | /opt/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/bin/FileCheck /opt/s/w/ir/cache/builder/src/third_party/llvm/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll

Exit Code: 2

Command Output (stderr):

FileCheck error: '' is empty.
FileCheck command line: /opt/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/bin/FileCheck /opt/s/w/ir/cache/builder/src/third_party/llvm/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll

@nico
Copy link
Contributor Author

nico commented Jul 13, 2021

Another example: http://45.33.8.238/mac/33435/step_11.txt

@lhames
Copy link
Contributor

lhames commented Jul 13, 2021

Thanks Nico. I should be able to look at this tomorrow or the next day (AEST).

If it's causing major disruption in the mean time I think it's reasonable to disable the test: This isn't heavily relied on at the moment.

If you do disable the test could you note the commit hash for that in this bug?

@nico
Copy link
Contributor Author

nico commented Jul 26, 2021

Here's another failure: http://45.33.8.238/mac/33879/step_11.txt

It doesn't fail all that often, every few days tops, so if you'll look into it soon I think it's fine to keep the test enabled until then.

@nico
Copy link
Contributor Author

nico commented Aug 25, 2021

Another failure today: http://45.33.8.238/mac/34774/step_11.txt

(I paste the ones I happen to notice; chances are it fails more frequently than I post comments :) )

@nico
Copy link
Contributor Author

nico commented Sep 11, 2021

And another: http://45.33.8.238/mac/35260/step_11.txt

@nico
Copy link
Contributor Author

nico commented Sep 13, 2021

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 20, 2021

@nico
Copy link
Contributor Author

nico commented Sep 21, 2021

lhames, given that this fails on other bots too, do you think it's time to disable the test for now?

@lhames
Copy link
Contributor

lhames commented Sep 26, 2021

@lhames
Copy link
Contributor

lhames commented Sep 26, 2021

Got it!

The stub manager is being torn down before the worker threads, but they might (if the scheduling is unlucky) still be using it.

I'll fix up the teardown order and it should fix this.

-- Lang.

@lhames
Copy link
Contributor

lhames commented Sep 26, 2021

Oh, scratch that. The shutdown order probably is worth looking into, but I don't think it's the source of this issue. It looks like the locking operations were dropped from CompileOnDemandLayer during one of the rewrites. I think adding them back in will fix this.

@lhames
Copy link
Contributor

lhames commented Sep 27, 2021

I wasn't able to reproduce this locally a second time. I've committed 1ea8d12, which may fix this.

Please let me know if there are any more bot failures. The advantage of this failing more frequently (and on more bots) is that we can at least gain some confidence that the problem has been fixed if/when the failures disappear.

@lhames
Copy link
Contributor

lhames commented Sep 27, 2021

Nope, that wasn't it -- I just saw the same crash at llvm::orc::LocalIndirectStubsManagerllvm::orc::OrcX86_64_SysV::~LocalIndirectStubsManager() + 28 (IndirectionUtils.h:363).

@lhames
Copy link
Contributor

lhames commented Sep 27, 2021

rbp: 0x00008003e07cbb60 rsp: 0x00007ffee07cbb50

RBP looks bogus -- I guess that's why we're not getting a backtrace for thread 0.

@lhames
Copy link
Contributor

lhames commented Nov 17, 2021

Nico -- Have you seen any more failures due to this?

I haven't seen any locally, but I'm also 99.9% certain it hasn't been fixed. I think that moving other tests from XFAIL (with crashes, and the attendant load of the crash reporter) to Unsupported might have made this less likely to trigger.

@nico
Copy link
Contributor Author

nico commented Nov 23, 2021

I haven't seen this in a few weeks.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla orcjit
Projects
None yet
Development

No branches or pull requests

3 participants