Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread step-out -m all-threads failing on FreeBSD #19721

Closed
emaste opened this issue Apr 4, 2014 · 17 comments
Closed

thread step-out -m all-threads failing on FreeBSD #19721

emaste opened this issue Apr 4, 2014 · 17 comments
Labels
bugzilla Issues migrated from bugzilla lldb

Comments

@emaste
Copy link
Member

emaste commented Apr 4, 2014

Bugzilla Link 19347
Version unspecified
OS FreeBSD
CC @jimingham

Extended Description

FAIL: test_step_all_threads_with_dwarf (TestThreadStepOut.ThreadStepOutTestCase)

  1. one thread stops at a breakpoint
  2. "bt all" shows the other thread with an IP of the breakpoint + 1
  3. execute "thread step-out -m all-threads"
  4. selected thread executes "step over breakpoint trap" with other threads suspended
  5. execution stops at the breakpoint in the other thread

feynman% bin/lldb /tank/emaste/src/llvm//tools/lldb/test/functionalities/thread/step_out/a.out
Current executable set to '/tank/emaste/src/llvm//tools/lldb/test/functionalities/thread/step_out/a.out' (x86_64).
(lldb) b 33
Breakpoint 1: where = a.out`step_out_of_here() + 4 at main.cpp:33, address = 0x0000000000400754
(lldb) run
Process 59762 launching
Process 59762 stopped
Process 59762 launched: '/tank/emaste/src/llvm//tools/lldb/test/functionalities/thread/step_out/a.out' (x86_64)
Process 59762 stopped

  • thread #​2: tid = 101824, 0x0000000000400754 a.outstep_out_of_here() + 4 at main.cpp:33, stop reason = breakpoint 1.1 frame #​0: 0x0000000000400754 a.outstep_out_of_here() + 4 at main.cpp:33
    30 volatile int g_test = 0;
    31
    32 void step_out_of_here() {
    -> 33 g_test += 5; // Set breakpoint here
    34 }
    35
    36 void *
    (lldb) bt all

  • thread #​2: tid = 101824, 0x0000000000400754 a.out`step_out_of_here() + 4 at main.cpp:33, stop reason = breakpoint 1.1

    • frame #​0: 0x0000000000400754 a.outstep_out_of_here() + 4 at main.cpp:33 frame #​1: 0x00000000004008bb a.outthread_func(input=0x0000000000000000) + 331 at main.cpp:43
      frame #​2: 0x0000000800822dc4 libthr.so.3`thread_start(curthread=0x0000000801807c00) + 260 at thr_create.c:284

    thread #​3: tid = 101583, 0x0000000000400755 a.outstep_out_of_here() + 5 at main.cpp:33 frame #​0: 0x0000000000400755 a.outstep_out_of_here() + 5 at main.cpp:33
    frame #​1: 0x00000000004008bb a.outthread_func(input=0x0000000000000000) + 331 at main.cpp:43 frame #​2: 0x0000000800822dc4 libthr.so.3thread_start(curthread=0x0000000801807800) + 260 at thr_create.c:284

    thread #​1: tid = 102182, 0x000000080082a59c libthr.so.3 at _umtx_op_err.S:37
    frame #​0: 0x000000080082a59c libthr.so.3 at _umtx_op_err.S:37
    (lldb) thread step-out -m all-threads
    Process 59762 stopped

  • thread #​3: tid = 101583, 0x0000000000400754 a.outstep_out_of_here() + 4 at main.cpp:33, stop reason = breakpoint 1.1 frame #​0: 0x0000000000400754 a.outstep_out_of_here() + 4 at main.cpp:33
    30 volatile int g_test = 0;
    31
    32 void step_out_of_here() {
    -> 33 g_test += 5; // Set breakpoint here
    34 }
    35
    36 void *

with step logging:

(lldb) thread step-out -m all-threads
Thread::PushPlan(0x0x80f88e800): "Stepping out from address 0x400754 to return address 0x4008bb using breakpoint site -6", tid = 0x1908f.
Process::PrivateResume() m_stop_id = 4, public state: stopped private state: stopped
Thread::PushPlan(0x0x80f88e800): "Single stepping past breakpoint site 6 at 0x400754", tid = 0x1908f.
WillResume Thread #​2 (0x0x80f88e600): tid = 0x190a9, pc = 0x00400755, sp = 0x7fffff9fcf50, fp = 0x7fffff9fcf50, plan = 'base plan', state = suspended, stop others = 0
WillResume Thread #​3 (0x0x80f88e800): tid = 0x1908f, pc = 0x00400754, sp = 0x7fffffbfdf50, fp = 0x7fffffbfdf50, plan = 'Step over breakpoint trap', state = stepping, stop others = 1
WillResume Thread #​1 (0x0x808f4ac00): tid = 0x1955b, pc = 0x80082a59c, sp = 0x7fffffffd3a8, fp = 0x801807400, plan = 'base plan', state = suspended, stop others = 0
Process thinks the process has resumed.
Current Plan for thread 2(0x80f88e600) (0x190a9, suspended): base plan being asked whether we should report run.
Current Plan for thread 3(0x80f88e800) (0x1908f, stepping): Step over breakpoint trap being asked whether we should report run.
Current Plan for thread 1(0x808f4ac00) (0x1955b, suspended): base plan being asked whether we should report run.
(lldb)
ThreadList::ShouldStop: 3 threads
Thread::ShouldStop for tid = 0x190a9 0x190a9, should_stop = 0 (ignore since thread was suspended)
Thread::ShouldStop(0x80f88e800) for tid = 0x1908f 0x1908f, pc = 0x000000000040075b
^^^^^^^^ Thread::ShouldStop Begin ^^^^^^^^
Plan stack initial state:
Plan Stack for thread #​3: tid = 0x1908f, stack_size = 3
Element 2: Single stepping past breakpoint site 6 at 0x400754
Element 1: Stepping out from address 0x400754 to return address 0x4008bb using breakpoint site -6
Element 0: Base thread plan.

Plan Step over breakpoint trap explains stop, auto-continue 1.
Plan Step over breakpoint trap should stop: 0.
Completed step over breakpoint plan.
Popping plan: "Step over breakpoint trap", tid = 0x1908f.
Plan Step out should stop: 0.
Plan stack final state:
Plan Stack for thread #​3: tid = 0x1908f, stack_size = 2
Element 1: Stepping out from address 0x400754 to return address 0x4008bb using breakpoint site -6
Element 0: Base thread plan.
Completed Plan Stack: 1 elements.
Element 0: Single stepping past breakpoint site 6 at 0x400754

vvvvvvvv Thread::ShouldStop End (returning 0) vvvvvvvv
Thread::ShouldStop for tid = 0x1955b 0x1955b, should_stop = 0 (ignore since thread was suspended)
ThreadList::ShouldStop overall should_stop = 0
ThreadList::ShouldReportStop 3 threads
Thread::ShouldReportStop() tid = 0x190a9: returning vote 0 (temporary state was suspended or invalid)
Thread::ShouldReportStop() tid = 0x1908f: returning vote for complete stack's back plan
ThreadPlan::ShouldReportStop() returning vote: no
Thread::ShouldReportStop() tid = 0x1955b: returning vote 0 (temporary state was suspended or invalid)
ThreadList::ShouldReportStop returning no
Process::PrivateResume() m_stop_id = 5, public state: running private state: stopped
WillResume Thread #​2 (0x0x80f88e600): tid = 0x190a9, pc = 0x00400755, sp = 0x7fffff9fcf50, fp = 0x7fffff9fcf50, plan = 'base plan', state = running, stop others = 0
WillResume Thread #​3 (0x0x80f88e800): tid = 0x1908f, pc = 0x0040075b, sp = 0x7fffffbfdf50, fp = 0x7fffffbfdf50, plan = 'Step out', state = running, stop others = 0
WillResume Thread #​1 (0x0x808f4ac00): tid = 0x1955b, pc = 0x80082a59c, sp = 0x7fffffffd3a8, fp = 0x801807400, plan = 'base plan', state = running, stop others = 0
Process thinks the process has resumed.

ThreadList::ShouldStop: 3 threads
Thread::ShouldStop(0x80f88e600) for tid = 0x190a9 0x190a9, pc = 0x0000000000400754
^^^^^^^^ Thread::ShouldStop Begin ^^^^^^^^
Plan stack initial state:
Plan Stack for thread #​2: tid = 0x190a9, stack_size = 1
Element 0: Base thread plan.

Plan base plan explains stop, auto-continue 0.
Base plan discarding thread plans for thread tid = 0x190a9 (breakpoint hit.)
Discarding thread plans for thread (tid = 0x190a9, force 0)
Base plan says should stop: 1.
Plan stack final state:
Plan Stack for thread #​2: tid = 0x190a9, stack_size = 1
Element 0: Base thread plan.

vvvvvvvv Thread::ShouldStop End (returning 1) vvvvvvvv
Thread::ShouldStop for tid = 0x1908f 0x1908f, pc = 0x000000000040075b, should_stop = 0 (ignore since no stop reason)
Thread::ShouldStop for tid = 0x1955b 0x1955b, pc = 0x000000080082a59c, should_stop = 0 (ignore since no stop reason)
ThreadList::ShouldStop overall should_stop = 1
Process 59766 stopped

  • thread #​2: tid = 102569, 0x0000000000400754 a.outstep_out_of_here() + 4 at main.cpp:33, stop reason = breakpoint 1.1 frame #​0: 0x0000000000400754 a.outstep_out_of_here() + 4 at main.cpp:33
    30 volatile int g_test = 0;
    31
    32 void step_out_of_here() {
    -> 33 g_test += 5; // Set breakpoint here
    34 }
    35
    36 void *
@emaste
Copy link
Member Author

emaste commented Apr 4, 2014

bisection shows this fails as of r205497: "Make the fail messages"

@jimingham
Copy link
Collaborator

This patch changed the way "Thread::SetResumeState" works. The problem was that some code intended to say "I would like this thread to run if nobody has any other opinion", e.g. the ThreadList::WillResume. Other code intended "make this thread runnable for sure.", e.g. Thread::Suspend. So I added an override_suspend parameter which should be false for the former folks, and true for the latter.

I noticed there were a few places where SetResumeState was called on the Linux/FreeBSD side, but in each case they were proceeded by some comment like:

// FIXME: This should probably happen somewhere else.

or

// TODO: the line below shouldn't really be done, but
// the POSIXThread might rely on this so I will leave this in for now

so I wasn't sure what to do. Apparently you may need to pass override_suspend=true to these calls? false is the default.

@emaste
Copy link
Member Author

emaste commented Apr 9, 2014

// FIXME: This should probably happen somewhere else.

// TODO: the line below shouldn't really be done, but
// the POSIXThread might rely on this so I will leave this in for now

Those ones are in ProcessPOSIX, but in member functions that are overridden in ProcessFreeBSD and thus only used on Linux, so they shouldn't be the fault here.

Thinking about this some more though, "step 2" in my initial description of the problem confuses me: it's surprising that the IP reported for the "other" thread is one beyond the breakpoint.

I think this is an issue with how "simultaneous" breakpoints get reported. "Our" thread executed the 0xCC trap, and ptrace reports the expected IP. The IP in the "other" thread suggests it has executed a one-byte instruction (the original instruction was wider).

@jimingham
Copy link
Collaborator

Is this on an x86 machine? The trap on x86 is executed. So when you are told you hit the trap you are supposed to decrement the PC by 1. Maybe when both threads hit the breakpoint, you are decrement the pc on one thread, but forgetting to do so on the other?

@emaste
Copy link
Member Author

emaste commented Apr 9, 2014

Yes, it is on x86.

What I've found so far is that both threads have executed the INT3, but the callback on FreeBSD only had a mechanism to be notified of one thread hitting the breakpoint.

I have some WIP that changes this to scan the whole list of threads after a stop event, and derive the breakpoint stop infos from there. It's not working 100% yet, but seems to be the best approach, and is also more in-line with ProcessGDBRemote, which I'm using as a template.

This gives me output as shown below, with two threads simultaneously reporting the breakpoint:

Process 50519 stopped

  • thread #​2: tid = 102352, 0x0000000000400754 a.outstep_out_of_here() + 4 at sim.cpp:33, stop reason = breakpoint 1.1 frame #​0: 0x0000000000400754 a.outstep_out_of_here() + 4 at sim.cpp:33
    30 volatile int g_test = 0;
    31
    32 void step_out_of_here() {
    -> 33 g_test += 5; // Set breakpoint here
    34 }
    35
    36 void *
    thread #​3: tid = 102344, 0x0000000000400754 a.outstep_out_of_here() + 4 at sim.cpp:33, stop reason = breakpoint 1.1 frame #​0: 0x0000000000400754 a.outstep_out_of_here() + 4 at sim.cpp:33
    30 volatile int g_test = 0;
    31
    32 void step_out_of_here() {
    -> 33 g_test += 5; // Set breakpoint here
    34 }
    35
    36 void *

@emaste
Copy link
Member Author

emaste commented Apr 9, 2014

There's a bit of a problem as well in that the other breakpoint SIGTRAP is already queued when I handle the first one, and it gets delivered when we try to step/continue.

@jimingham
Copy link
Collaborator

Yes, the way Mac OS X works you when you get one exception (equivalent of the SIGTRAP you get) we poll the exception port till we don't get any more exceptions, and then report the whole bundle. Don't think there's a polling version of ptrace, however.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 11, 2014

Just adding that there are unit test failures in Linux related to this as well, specifically TestCreateDuringStep.py and TestExitDuringStep.py which simply run forever now. If I can help with testing anything just let me know.

@jimingham
Copy link
Collaborator

So if it is true that JUST reverting r205497 makes these failures go away, then the fix should be easy. All that patch did was change the behavior of Thread::SetResumeState(eStateRunning) such that if the state had been eStateSuspended, then it would not change the state. If you want to revert that call to the previous behavior, change the call to: Thread::SetResumeState(eStateRunning, true).

Note, however, that the thread state should only be being set to eStateSuspended when the user wants to keep that thread from running. In the normal course of operation, we suspend some threads to do things like step over breakpoints, etc. But that only affects the m_temporary_resume_state in the thread.

@emaste
Copy link
Member Author

emaste commented Apr 11, 2014

Yes, I'm able to have the test pass on FreeBSD by just reverting r205497. However, this investigation highlighted an underlying problem with the way simultaneous breakpoints are handled, so I'd like to resolve that first.

When we stop for the first trap we could determine that another thread has a pending trap, but I don't think there's an equivalent to polling the exception port as on OS X as you describe.

What I might be able to do is poll the threads after the trap, and issue a PT_CONTINUE if any other than the reported stop thread indicate a pending trap, which should deliver it immediately and both/all could be handled together.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 18, 2014

Hey Ed - I'd like to resolve this on the Linux side. You had indicated you had some code in motion for this. What was the status of your code change? Would it be FreeBSD only? I can disable these tests on Linux, try Jim's override suggestion, back out the change, wait for Codeplay's patch (Luke just indicated they are working on one on the lldb-dev list), etc.

If I don't hear shortly, I'll just disable these on Linux for the moment and re-enable when we work out what we're doing here.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 18, 2014

At the moment I'm trying the passing of true to the defaulted second parameter of SetResumeState() in the two calls in POSIXThread.

This gets me to this state (2 failures as opposed to hangs):

FAIL: LLDB (suite) :: TestAttachResume.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #​1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
FAIL: LLDB (suite) :: TestPtrRef2Typedef.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #​1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
ninja: build stopped: subcommand failed.

I consider this a step in the right direction but I will look into these two first to see if they're fallout or unrelated.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 18, 2014

The TestAttachResume failure looks to be related. It's listed as hitting the breakpoint twice now.

This code in process_attach_continue_interrupt_detach():

    self.assertTrue(wait_for_state(lldb.eStateStopped),
        'Process not stopped after breakpoint')
    self.expect('br list', 'Breakpoint not hit',
        patterns = ['hit count = 1'])

is hitting twice, producing this failure:

runCmd: br list
output: Current breakpoints:
1: file = 'main.c', line = 12, locations = 1, resolved = 1, hit count = 2
1.1: where = AttachResume`start + 29 at main.c:12, address = 0x000000000040070
d, resolved, hit count = 2

Expecting pattern: hit count = 1
Not matched

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 18, 2014

I found the smallest change that got Linux running:

Waiting for Emacs...
Sending source/Plugins/Process/POSIX/POSIXThread.cpp
Transmitting file data .
Committed revision 206618.

I filed a follow-up bug for a resume test that now fails due to getting hit twice when expected once, quite possibly related to the stop-gap fix above. Marked that test as expected fail on Linux.

@emaste
Copy link
Member Author

emaste commented Apr 21, 2014

FWIW TestAttachResume works for me (although a related failure in llvm.org/pr19310 seems to be no longer reproducible).

@emaste
Copy link
Member Author

emaste commented Nov 5, 2020

Appears test has been renamed; as of today
$ bin/lldb-dotest -p TestThreadStepOut -f test_step_all_threads_dwarf
reports XFAIL. I have not confirmed that the issue is identical to that reported in 2014.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021
@mgorny
Copy link
Member

mgorny commented Jun 17, 2022

It seems that these tests are passing today. The tests were also marked to pass on Linux recently, so I guess the failures had a common cause. I'll be removing the XFAIL in a minute.

@mgorny mgorny closed this as completed Jun 17, 2022
mgorny added a commit that referenced this issue Jun 17, 2022
Fixes #19721
Fixes #18440
Partially fixes bug #47660
Fixes #47761
Fixes #47763

Sponsored by: The FreeBSD Foundation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla lldb
Projects
None yet
Development

No branches or pull requests

4 participants