Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No inline debug info generated for small functions when optimizing #26004

Open
dnovillo opened this issue Nov 24, 2015 · 14 comments
Open

No inline debug info generated for small functions when optimizing #26004

dnovillo opened this issue Nov 24, 2015 · 14 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla

Comments

@dnovillo
Copy link
Contributor

Bugzilla Link 25630
Version trunk
OS All
Blocks #30616
Attachments Test case, The same test case with more code in the inlined function
CC @adrian-prantl,@dwblaikie,@echristo,@hfinkel,@pogo59,@qcolombet,@sanjoy,@Melamoto

Extended Description

For one liner functions, LLVM seems to not be emitting inlined-at information in the final binary. The inlinedAt tags are there in the bytecode, but don't show up in the final binary.

If I change the code of the inlined function to have more than one line, then the inlinedAt markers survive to the final binary.

$ clang++ -O2 -g bug.cc -o bug
$ llvm-dwarfdump --debug-dump=info bug|grep -A6 DW_TAG_inlined_subroutine

When compiled with GCC, the inlined function is marked properly:

$ g++-4.9 -O2 -g bug.cc -o gcc-bug
$ llvm-dwarfdump --debug-dump=info gcc-bug|grep -A6 DW_TAG_inlined_subroutine
0x00000476: DW_TAG_inlined_subroutine [21] *
DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x028d => {0x000003da} "_Z3sumxx")
DW_AT_low_pc [DW_FORM_addr] (0x0000000000400559)
DW_AT_high_pc [DW_FORM_data8] (0x0000000000000003)
DW_AT_call_file [DW_FORM_data1] ("./bug.cc")
DW_AT_call_line [DW_FORM_data1] (8)
DW_AT_GNU_discriminator [DW_FORM_data1] (0x02)

If I modified the inlined function to have more code, LLVM will generate a DW_TAG_inlined_subroutine tag. I did this in the no-bug.cc test case:

$ clang++ -g -O2 no-bug.cc -o no-bug
$ llvm-dwarfdump --debug-dump=info no-bug|grep -A6 DW_TAG_inlined_subroutine
0x000000b1: DW_TAG_inlined_subroutine [9] *
DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x004c => {0x0000004c} "_Z3sumxx")
DW_AT_low_pc [DW_FORM_addr] (0x000000000040063e)
DW_AT_high_pc [DW_FORM_data4] (0x00000010)
DW_AT_call_file [DW_FORM_data1] ("./no-bug.cc")
DW_AT_call_line [DW_FORM_data1] (13)
DW_AT_GNU_discriminator [DW_FORM_data1] (0x01)

@dnovillo
Copy link
Contributor Author

assigned to @Melamoto

@dwblaikie
Copy link
Collaborator

This is likely a debug location optimization problem. It would affect inlining or without inlining.

If anyone wants to look into this, just watch which optimization removes the last use of the DebugLoc with the inlinedAt info. Chances are two instructions, one inside and one outside the inlined function, might get coalesced into some other operation and the outer instruction's location is chosen. Then we might split it back out (there was a bug filed recently about switch lowering doing something like this, maybe?) into multiple instructions in the backend without the extra information available anymore.

I'll see if I can reduce this a bit further...

@dwblaikie
Copy link
Collaborator

With some changes this seemed to repro at -O1 for me. Still get an add instruction that looks like it's sufficiently similar to the add in the inlined function that it should be annotated as such.

In the O1 IR, the add instruction appears to still have the right location. So I guess this is somewhere down in selection dag or other backend-y optimizations.

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 6, 2016

Slightly simpler testcase, removing the dependency on printf

int sum(int x, int y) { return x + y; }

int main() {
int s, i = 0;
while (i++ < 400)
if (i != 100) s = sum(i, s); else s = 30;
return s;
}

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 6, 2016

It's definitely a problem in the backend.
Fully reduced testcase:

define i32 @​main() !dbg !​3 {
%add.i.4 = add nsw i32 undef, undef, !dbg !​6
%.add.i.4 = select i1 undef, i32 30, i32 %add.i.4, !dbg !​9
ret i32 %.add.i.4, !dbg !​10
}
!llvm.dbg.cu = !{#0}
!llvm.module.flags = !{#2}
!​0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !​1, producer: "clang version 4.0.0 (trunk 288665) (llvm/trunk 288759)", isOptimized: false, runtimeVersion: 0, emissionKind: LineTablesOnly)
!​1 = !DIFile(filename: "bug.cc", directory: "/home/davide/work/llvm/build-clang/bin")
!​2 = !{i32 2, !"Debug Info Version", i32 3}
!​3 = distinct !DISubprogram(name: "main", scope: !​1, file: !​1, line: 3, type: !​4, isLocal: false, isDefinition: true, scopeLine: 3, flags: DIFlagPrototyped, isOptimized: false, unit: !​0)
!​4 = !DISubroutineType(types: !​5)
!​5 = !{}
!​6 = !DILocation(line: 1, column: 34, scope: !​7, inlinedAt: !​8)
!​7 = distinct !DISubprogram(name: "sum", scope: !​1, file: !​1, line: 1, type: !​4, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: false, unit: !​0)
!​8 = distinct !DILocation(line: 6, column: 23, scope: !​3)
!​9 = !DILocation(line: 6, column: 9, scope: !​3)
!​10 = !DILocation(line: 7, column: 3, scope: !​3)

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 6, 2016

$ cat run.sh
#!/bin/sh

rm -rf bugopt bugopt.o
echo "Result at O0"
/home/davide/work/llvm/build-clang/bin/llc -O0 $1 -o bugopt
/home/davide/work/llvm/build-clang/bin/llvm-mc -filetype=obj bugopt -o bugopt.o
/home/davide/work/llvm/build-clang/bin/llvm-dwarfdump --debug-dump=info bugopt.o |grep -A6 DW_TAG_inlined_subroutine

echo "Result at 01"
rm -rf bugopt bugopt.o
/home/davide/work/llvm/build-clang/bin/llc -O1 $1 -o bugopt
/home/davide/work/llvm/build-clang/bin/llvm-mc -filetype=obj bugopt -o bugopt.o
/home/davide/work/llvm/build-clang/bin/llvm-dwarfdump --debug-dump=info bugopt.o |grep -A6 DW_TAG_inlined_subroutine

$ ./run.sh reduced.ll
Result at O0
0x00000040: DW_TAG_inlined_subroutine [4]
DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x002a => {0x0000002a} "sum")
DW_AT_low_pc [DW_FORM_addr] (0x0000000000000005)
DW_AT_high_pc [DW_FORM_data4] (0x00000002)
DW_AT_call_file [DW_FORM_data1] ("/home/davide/work/llvm/build-clang/bin/bug.cc")
DW_AT_call_line [DW_FORM_data1] (6)

Result at 01
$

Taking this, trying to understand which backend pass destroys the information.

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 7, 2016

I have a theory now.
Please ignore my previous reduction, I reduced the testcase too much.
After another (painful) reduction, I think I identified the pass which introduces the problem (see new attached testcase), that is LoopStrengthReduce (which is run as part of llc -O1 and beyond).
Before LoopStrengthReduce runs the IR looks like:

%add.i = add nsw i32 %inc5, %s.04, !dbg !​32
tail call void @​llvm.dbg.value(metadata i32 %add.i, i64 0, metadata !​22, metadata !​11), !dbg !​23
%.add.i = select i1 %cmp1, i32 30, i32 %add.i, !dbg !​33

where
!​32 = !DILocation(line: 1, column: 34, scope: !​6, inlinedAt: !​29)
!​33 = !DILocation(line: 6, column: 9, scope: !​17)

So the add instruction contains the inlinedAt info and it points which points to the correct location in my reduced *.cc testcase:
!​29 = distinct !DILocation(line: 6, column: 23, scope: !​30)
that is -> sum(i, s)

and the select instruction points to the i inside the if (i != 100)

During LoopStrengthReduce the add instruction is replaced, so the IR looks like this:

tail call void @​llvm.dbg.value(metadata !​2, i64 0, metadata !​22, metadata !​11), !dbg !​23
%2 = add i32 %.add.i, %lsr.iv, !dbg !​32
%3 = add i32 %2, -3, !dbg !​32
%.add.i.1 = select i1 %cmp1.1, i32 30, i32 %3, !dbg !​32

where
!​32 = !DILocation(line: 6, column: 9, scope: !​17)

so, both the add and the select point to the i inside the if i != 100

Stepping through with the debuggger reveals that when the Expander runs it inserts the add instructions assigning the debug metadata of select (i.e. !​33 before LSR, which becomes !​32 after renaming the metadata because the original add is dead).

I'm not entirely sure what's the best solution here, so I'm asking Quentin.
I think that LSR (or SCEV) should (if possible) look at assigning the correct metadata when creating the new add instructions (and/or preserving the inlineAt info).

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 7, 2016

reduced testcase

@qcolombet
Copy link
Collaborator

It sounds like a job for the SCEVExpander at first glance.

AFAICT, LSR is not playing with the debug locations and the SCEVExpander should have access to the Value, hence the debug location, related to the different expressions it expands. Right now, the expander uses the debug location of the insertion point instead of the original value, if I am not mistaken.

I am not familiar with the internals of SCEV though.

Sanjoy, what's your take on this?

@sanjoy
Copy link
Contributor

sanjoy commented Dec 11, 2016

It sounds like a job for the SCEVExpander at first glance.

AFAICT, LSR is not playing with the debug locations and the SCEVExpander
should have access to the Value, hence the debug location, related to the
different expressions it expands. Right now, the expander uses the debug
location of the insertion point instead of the original value, if I am not
mistaken.

Is it possible to actually get to (the debug location of the) original value from within SCEVExpander? I'd think the transform that is requesting an expansion would have a better idea of what the debug location should be.

I am not familiar with the internals of SCEV though.

Sanjoy, what's your take on this?

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 13, 2016

I'm not familiar with neither LSR or the internals of SCEV, but I spent a decent amount of time stepping in the debugger today.
So, yes, as Quentin pointed out SCEVExpander::expand() computes an insertion point for the SCEV object and assigns to the newly inserted instructions the DI of the instruction correspoding to the insertion point.
I'm not entirely sure if there's an easy way to access the value we're trying to replace from inside SCEVExpander, but it's available in the caller stack (LF.OperandValToReplace).

As a quick hack, I tried to force the debug location to be set correctly for this case and it seems to work, but I'm worried this is too specific to my case.

diff --git a/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index a61f646..7019d56 100644
--- a/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -4576,6 +4576,11 @@ Value *LSRInstance::Expand(const LSRUse &LU,
// unfolded offsets. LSR assumes they both live next to their uses.
if (!Ops.empty()) {
Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty);

  • if (isa(FullV) &&
  •    isa<Instruction>(LF.OperandValToReplace)) {
    
  •  (cast<Instruction>(FullV))->setDebugLoc(
    
  •    (cast<Instruction>(LF.OperandValToReplace))->getDebugLoc());
    
  • }
    Ops.clear();
    Ops.push_back(SE.getUnknown(FullV));
    }

So, I think there are two possible solutions to the problem:

  1. LSR updates the debug locations, as my hack does, but maybe in a more systematic way.
  2. expandCodeFor() gets another overload where it takes the value that is getting replaced (or equivalently its debugloc) and SCEVExpander sets the correct debug location accordingly.

Does this make sense? Ideas?

@sanjoy
Copy link
Contributor

sanjoy commented Dec 24, 2016

So, I think there are two possible solutions to the problem:

  1. LSR updates the debug locations, as my hack does, but maybe in a more
    systematic way.
  2. expandCodeFor() gets another overload where it takes the value that is
    getting replaced (or equivalently its debugloc) and SCEVExpander sets the
    correct debug location accordingly.

Does this make sense? Ideas?

I think we discussed this on IRC some time back, but I think the variant of (2) that changes expandCodeFor to take a DebugLoc is the right path forward.

@qcolombet
Copy link
Collaborator

So, I think there are two possible solutions to the problem:

  1. LSR updates the debug locations, as my hack does, but maybe in a more
    systematic way.
  2. expandCodeFor() gets another overload where it takes the value that is
    getting replaced (or equivalently its debugloc) and SCEVExpander sets the
    correct debug location accordingly.

Does this make sense? Ideas?

I think we discussed this on IRC some time back, but I think the variant of
(2) that changes expandCodeFor to take a DebugLoc is the right path forward.

Agreed, #​2 seems sensible to me.

@adrian-prantl
Copy link
Collaborator

mentioned in issue #30616

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

6 participants