25630 – No inline debug info generated for small functions when optimizing

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 25630 - No inline debug info generated for small functions when optimizing

Summary: No inline debug info generated for small functions when optimizing

Status:	NEW

Alias:	None

Product:	new-bugs
Classification:	Unclassified
Component:	new bugs (show other bugs)
Version:	trunk
Hardware:	PC All

Importance:	P normal
Assignee:	Stephen Tozer

URL:
Keywords:

Depends on:
Blocks:	31268
	Show dependency tree

Reported:	2015-11-24 15:49 PST by Diego Novillo
Modified:	2019-05-15 03:24 PDT (History)
CC List:	12 users (show)

See Also:
Fixed By Commit(s):

Attachments
Test case (254 bytes, text/plain) 2015-11-24 15:49 PST, Diego Novillo	Details
The same test case with more code in the inlined function (296 bytes, application/octet-stream) 2015-11-24 15:49 PST, Diego Novillo	Details
reduced testcase (7.66 KB, application/octet-stream) 2016-12-06 20:36 PST, Davide Italiano	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Diego Novillo 2015-11-24 15:49:15 PST

Created attachment 15349 [details]
Test case

For one liner functions, LLVM seems to not be emitting inlined-at information in the final binary.  The inlinedAt tags are there in the bytecode, but don't show up in the final binary.

If I change the code of the inlined function to have more than one line, then the inlinedAt markers survive to the final binary.

$ clang++ -O2 -g bug.cc -o bug
$ llvm-dwarfdump --debug-dump=info bug|grep -A6 DW_TAG_inlined_subroutine

When compiled with GCC, the inlined function is marked properly:

$ g++-4.9 -O2 -g bug.cc -o gcc-bug
$ llvm-dwarfdump --debug-dump=info gcc-bug|grep -A6 DW_TAG_inlined_subroutine
0x00000476:       DW_TAG_inlined_subroutine [21] *
                    DW_AT_abstract_origin [DW_FORM_ref4]        (cu + 0x028d => {0x000003da} "_Z3sumxx")
                    DW_AT_low_pc [DW_FORM_addr] (0x0000000000400559)
                    DW_AT_high_pc [DW_FORM_data8]       (0x0000000000000003)
                    DW_AT_call_file [DW_FORM_data1]     ("./bug.cc")
                    DW_AT_call_line [DW_FORM_data1]     (8)
                    DW_AT_GNU_discriminator [DW_FORM_data1]     (0x02)


If I modified the inlined function to have more code, LLVM will generate a DW_TAG_inlined_subroutine tag.  I did this in the no-bug.cc test case:

$ clang++ -g -O2 no-bug.cc -o no-bug
$ llvm-dwarfdump --debug-dump=info no-bug|grep -A6 DW_TAG_inlined_subroutine
0x000000b1:     DW_TAG_inlined_subroutine [9] *
                  DW_AT_abstract_origin [DW_FORM_ref4]  (cu + 0x004c => {0x0000004c} "_Z3sumxx")
                  DW_AT_low_pc [DW_FORM_addr]   (0x000000000040063e)
                  DW_AT_high_pc [DW_FORM_data4] (0x00000010)
                  DW_AT_call_file [DW_FORM_data1]       ("./no-bug.cc")
                  DW_AT_call_line [DW_FORM_data1]       (13)
                  DW_AT_GNU_discriminator [DW_FORM_data1]       (0x01)

Comment 1 Diego Novillo 2015-11-24 15:49:45 PST

Created attachment 15350 [details]
The same test case with more code in the inlined function

Comment 2 David Blaikie 2015-11-24 16:13:19 PST

This is likely a debug location optimization problem. It would affect inlining or without inlining.

If anyone wants to look into this, just watch which optimization removes the last use of the DebugLoc with the inlinedAt info. Chances are two instructions, one inside and one outside the inlined function, might get coalesced into some other operation and the outer instruction's location is chosen. Then we might split it back out (there was a bug filed recently about switch lowering doing something like this, maybe?) into multiple instructions in the backend without the extra information available anymore.

I'll see if I can reduce this a bit further...

Comment 3 David Blaikie 2015-11-24 16:28:37 PST

With some changes this seemed to repro at -O1 for me. Still get an add instruction that looks like it's sufficiently similar to the add in the inlined function that it should be annotated as such.

In the O1 IR, the add instruction appears to still have the right location. So I guess this is somewhere down in selection dag or other backend-y optimizations.

Comment 4 Davide Italiano 2016-12-06 14:38:51 PST

Slightly simpler testcase, removing the dependency on printf

int sum(int x, int y) { return x + y; }

int main() {
  int s, i = 0;
  while (i++ < 400)
    if (i != 100) s = sum(i, s); else s = 30;
  return s;
}

Comment 5 Davide Italiano 2016-12-06 15:45:48 PST

It's definitely a problem in the backend.
Fully reduced testcase:

define i32 @main() !dbg !3 {
  %add.i.4 = add nsw i32 undef, undef, !dbg !6
  %.add.i.4 = select i1 undef, i32 30, i32 %add.i.4, !dbg !9
  ret i32 %.add.i.4, !dbg !10
}
!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2}
!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 4.0.0 (trunk 288665) (llvm/trunk 288759)", isOptimized: false, runtimeVersion: 0, emissionKind: LineTablesOnly)
!1 = !DIFile(filename: "bug.cc", directory: "/home/davide/work/llvm/build-clang/bin")
!2 = !{i32 2, !"Debug Info Version", i32 3}
!3 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 3, type: !4, isLocal: false, isDefinition: true, scopeLine: 3, flags: DIFlagPrototyped, isOptimized: false, unit: !0)
!4 = !DISubroutineType(types: !5)
!5 = !{}
!6 = !DILocation(line: 1, column: 34, scope: !7, inlinedAt: !8)
!7 = distinct !DISubprogram(name: "sum", scope: !1, file: !1, line: 1, type: !4, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: false, unit: !0)
!8 = distinct !DILocation(line: 6, column: 23, scope: !3)
!9 = !DILocation(line: 6, column: 9, scope: !3)
!10 = !DILocation(line: 7, column: 3, scope: !3)

Comment 6 Davide Italiano 2016-12-06 15:47:09 PST

$ cat run.sh
#!/bin/sh

rm -rf bugopt bugopt.o
echo "Result at O0"
/home/davide/work/llvm/build-clang/bin/llc -O0 $1 -o bugopt
/home/davide/work/llvm/build-clang/bin/llvm-mc -filetype=obj bugopt -o bugopt.o
/home/davide/work/llvm/build-clang/bin/llvm-dwarfdump --debug-dump=info bugopt.o |grep -A6 DW_TAG_inlined_subroutine

echo "Result at 01"
rm -rf bugopt bugopt.o
/home/davide/work/llvm/build-clang/bin/llc -O1 $1 -o bugopt
/home/davide/work/llvm/build-clang/bin/llvm-mc -filetype=obj bugopt -o bugopt.o
/home/davide/work/llvm/build-clang/bin/llvm-dwarfdump --debug-dump=info bugopt.o |grep -A6 DW_TAG_inlined_subroutine

$ ./run.sh reduced.ll
Result at O0
0x00000040:     DW_TAG_inlined_subroutine [4]
                  DW_AT_abstract_origin [DW_FORM_ref4]  (cu + 0x002a => {0x0000002a} "sum")
                  DW_AT_low_pc [DW_FORM_addr]   (0x0000000000000005)
                  DW_AT_high_pc [DW_FORM_data4] (0x00000002)
                  DW_AT_call_file [DW_FORM_data1]       ("/home/davide/work/llvm/build-clang/bin/bug.cc")
                  DW_AT_call_line [DW_FORM_data1]       (6)

Result at 01
$

Taking this, trying to understand which backend pass destroys the information.

Comment 7 Davide Italiano 2016-12-06 20:35:17 PST

I have a theory now.
Please ignore my previous reduction, I reduced the testcase too much.
After another (painful) reduction, I think I identified the pass which introduces the problem (see new attached testcase), that is LoopStrengthReduce (which is run as part of llc -O1 and beyond).
Before LoopStrengthReduce runs the IR looks like:

  %add.i = add nsw i32 %inc5, %s.04, !dbg !32
  tail call void @llvm.dbg.value(metadata i32 %add.i, i64 0, metadata !22, metadata !11), !dbg !23
  %.add.i = select i1 %cmp1, i32 30, i32 %add.i, !dbg !33

where
!32 = !DILocation(line: 1, column: 34, scope: !6, inlinedAt: !29)
!33 = !DILocation(line: 6, column: 9, scope: !17)

So the `add` instruction contains the inlinedAt info and it points which points to the correct location in my reduced *.cc testcase:
!29 = distinct !DILocation(line: 6, column: 23, scope: !30)
that is -> sum(i, s)

and the `select` instruction points to the `i` inside the `if (i != 100)`

###
During LoopStrengthReduce the `add` instruction is replaced, so the IR looks like this:

  tail call void @llvm.dbg.value(metadata !2, i64 0, metadata !22, metadata !11), !dbg !23
  %2 = add i32 %.add.i, %lsr.iv, !dbg !32
  %3 = add i32 %2, -3, !dbg !32
  %.add.i.1 = select i1 %cmp1.1, i32 30, i32 %3, !dbg !32

where
!32 = !DILocation(line: 6, column: 9, scope: !17) 

so, both the `add` and the `select` point to the `i` inside the `if i != 100`

Stepping through with the debuggger reveals that when the Expander runs it inserts the `add` instructions assigning the debug metadata of `select` (i.e. !33 before LSR, which becomes !32 after renaming the metadata because the original add is dead).

I'm not entirely sure what's the best solution here, so I'm asking Quentin. 
I think that LSR (or SCEV) should (if possible) look at assigning the correct metadata when creating the new add instructions (and/or preserving the `inlineAt` info).

Comment 8 Davide Italiano 2016-12-06 20:36:22 PST

Created attachment 17731 [details]
reduced testcase

Comment 9 Quentin Colombet 2016-12-09 16:17:09 PST

It sounds like a job for the SCEVExpander at first glance.

AFAICT, LSR is not playing with the debug locations and the SCEVExpander should have access to the Value, hence the debug location, related to the different expressions it expands. Right now, the expander uses the debug location of the insertion point instead of the original value, if I am not mistaken.

I am not familiar with the internals of SCEV though.

Sanjoy, what's your take on this?

Comment 10 Sanjoy Das 2016-12-11 14:42:09 PST

(In reply to comment #9)
> It sounds like a job for the SCEVExpander at first glance.
> 
> AFAICT, LSR is not playing with the debug locations and the SCEVExpander
> should have access to the Value, hence the debug location, related to the
> different expressions it expands. Right now, the expander uses the debug
> location of the insertion point instead of the original value, if I am not
> mistaken.

Is it possible to actually get to (the debug location of the) original value from within SCEVExpander?  I'd think the transform that is requesting an expansion would have a better idea of what the debug location should be.
 
> I am not familiar with the internals of SCEV though.
> 
> Sanjoy, what's your take on this?

Comment 11 Davide Italiano 2016-12-12 19:42:02 PST

I'm not familiar with neither LSR or the internals of SCEV, but I spent a decent amount of time stepping in the debugger today.
So, yes, as Quentin pointed out SCEVExpander::expand() computes an insertion point for the SCEV object and assigns to the newly inserted instructions the DI of the instruction correspoding to the insertion point.
I'm not entirely sure if there's an easy way to access the value we're trying to replace from inside SCEVExpander, but it's available in the caller stack (LF.OperandValToReplace).

As a quick hack, I tried to force the debug location to be set correctly for this case and it seems to work, but I'm worried this is too specific to my case.

diff --git a/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index a61f646..7019d56 100644
--- a/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -4576,6 +4576,11 @@ Value *LSRInstance::Expand(const LSRUse &LU,
   // unfolded offsets. LSR assumes they both live next to their uses.
   if (!Ops.empty()) {
     Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty);
+    if (isa<Instruction>(FullV) &&
+        isa<Instruction>(LF.OperandValToReplace)) {
+      (cast<Instruction>(FullV))->setDebugLoc(
+        (cast<Instruction>(LF.OperandValToReplace))->getDebugLoc());
+    }
     Ops.clear();
     Ops.push_back(SE.getUnknown(FullV));
   }


So, I think there are two possible solutions to the problem:
1) LSR updates the debug locations, as my hack does, but maybe in a more systematic way.
2) expandCodeFor() gets another overload where it takes the value that is getting replaced (or equivalently its debugloc) and SCEVExpander sets the correct debug location accordingly.

Does this make sense? Ideas?

Comment 12 Sanjoy Das 2016-12-24 14:15:22 PST

(In reply to comment #11)
> So, I think there are two possible solutions to the problem:
> 1) LSR updates the debug locations, as my hack does, but maybe in a more
> systematic way.
> 2) expandCodeFor() gets another overload where it takes the value that is
> getting replaced (or equivalently its debugloc) and SCEVExpander sets the
> correct debug location accordingly.
> 
> Does this make sense? Ideas?

I think we discussed this on IRC some time back, but I think the variant of (2) that changes expandCodeFor to take a DebugLoc is the right path forward.

Comment 13 Quentin Colombet 2017-01-11 14:45:36 PST

(In reply to comment #12)
> (In reply to comment #11)
> > So, I think there are two possible solutions to the problem:
> > 1) LSR updates the debug locations, as my hack does, but maybe in a more
> > systematic way.
> > 2) expandCodeFor() gets another overload where it takes the value that is
> > getting replaced (or equivalently its debugloc) and SCEVExpander sets the
> > correct debug location accordingly.
> > 
> > Does this make sense? Ideas?
> 
> I think we discussed this on IRC some time back, but I think the variant of
> (2) that changes expandCodeFor to take a DebugLoc is the right path forward.

Agreed, #2 seems sensible to me.