Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Mismatched function data" warnings when collecting profiles for clang #32849

Open
vedantk opened this issue Jun 18, 2017 · 16 comments
Open

"Mismatched function data" warnings when collecting profiles for clang #32849

vedantk opened this issue Jun 18, 2017 · 16 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla clang:codegen confirmed Verified by a second party coverage tools:llvm-cov

Comments

@vedantk
Copy link
Collaborator

vedantk commented Jun 18, 2017

Bugzilla Link 33502
Version trunk
OS All
CC @kcc,@morehouse,@Dor1s,@vedantk

Extended Description

After building some llvm tools with -fcoverage-mapping -fprofile-instr-generate, running the tools, and merging the resulting raw profiles, there are "N functions have mismatched data" warnings.

Investigate the source of these warnings, and improve the diagnostic to be more helpful/precise.

@vedantk
Copy link
Collaborator Author

vedantk commented Jun 18, 2017

assigned to @vedantk

@Dor1s
Copy link
Member

Dor1s commented Sep 20, 2017

We've noticed similar issue in Chrome: https://bugs.chromium.org/p/chromium/issues/detail?id=764484

We verified the following:

  1. the target has been compiled with proper flags
  2. the code we're interested in is definitely being executed
  3. the code has profiler instrumentation

However, we are getting the warning + coverage report doesn't show anything for the files we are interested in.

It's worth to mention that the target is fairly large (CSS parser that is linked with lots of Blink stuff).

@vedantk
Copy link
Collaborator Author

vedantk commented Sep 21, 2017

llvm-cov's diagnostic for this problem is a bit better now (r313853). If you pass "-dump" along with your regular llvm-cov invocation, you should see messages like:

hash-mismatch: No profile record found for 'main' with hash = 0xA

I've added some explanations of what these mismatches mean in the doxygen comments for the CoverageMapping class. Hopefully that will make it easier to figure out which object file contribute profile records with unexpected hashes.

I've also just built ToT clang, run "clang -help", merged the raw profile, and invoked llvm-cov. I get no function mismatches. Merging in profiles from a few additional clang invocations is also no issue. We get mismatches only after merging in profiles from other tools (llvm-tblgen, llc, FileCheck, etc).

To avoid this problem, we could name profiles in a way that makes it hard to merge together profiles for different binaries. E.g, that means teaching FileCheck to write to "FileCheck_%m.profraw", llc to "llc_%m.profraw", etc. But before doing that, we should narrow down the cause of the mismatches.

@Dor1s
Copy link
Member

Dor1s commented Sep 21, 2017

Thanks for clarifications, Vedant! We'll take a closer look using the ToT revision and "-dump" option. Will post an update after that.

@Dor1s
Copy link
Member

Dor1s commented Sep 27, 2017

Forwarding a comment from Chromium bug tracker:

I have built ToT llvm and clang and used them to build my fuzzer and generate the coverage report, and the issue remains.

After using the -dump option with llvm-cov, it complains that 148 functions have mismatched data and then prints 148 lines similar to the one below:

hash-mismatch: No profile record found for '_ZN7logging11CheckGEImplEiiPKc' with hash = 0x0

In all of these cases, the hash was 0x0 and the function name was some kind of mangled name (like the one above) beginning with the prefix "_ZN"

@Dor1s
Copy link
Member

Dor1s commented Sep 27, 2017

The root cause seems to be a 0 hash (see https://bugs.chromium.org/p/chromium/issues/detail?id=764484#c14)

Also Jonathan came up with a small reproducer:

printf '#include \n void h() {std::cout << 1 << std::endl;} int main() {h();h();h();return 0;}' > test.cc; clang++ -fprofile-instr-generate -fcoverage-mapping test.cc -o test; LLVM_PROFILE_FILE=test.profdata ./test >/dev/null; llvm-profdata show test.profdata -all-functions

The output of which is:

Counters:
_Z1hv:
Hash: 0x0000000000000000
Counters: 1
Function count: 3
main:
Hash: 0x0000000000000000
Counters: 1
Function count: 1
Functions shown: 2
Total functions: 2
Maximum function count: 3
Maximum internal block count: 0

llvm-cov doesn't complain though

@vedantk
Copy link
Collaborator Author

vedantk commented Oct 6, 2017

hash-mismatch: No profile record found for '_ZN7logging11CheckGEImplEiiPKc' with hash = 0x0

It's OK for functions to have a hash equal to 0. The question is, why does the profile record found for '_ZN7logging11CheckGEImplEiiPKc' not have a hash equal to 0?

@Dor1s
Copy link
Member

Dor1s commented Oct 17, 2017

Vedant, it looks like we've been seeing that issue even with fresh clang revision because we had some dynamic libraries linked in, instead of using static linking. We cannot reproduce it anymore since we started explicitly enable static linking for our targets. Thanks a lot for your help!

@vedantk
Copy link
Collaborator Author

vedantk commented Oct 17, 2017

I'm glad you are unblocked but am worried that there are issues using dylibs. Please do let me know if you see more instances of suspicious hash mismatches, and I'll try and set aside more time to reduce the issue.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@kenkopelson
Copy link

kenkopelson commented Nov 19, 2023

@vedantk, this is most definitely an issue, even now in 2023. The gentleman who said he saw the problem go away when doing static linking is only covering up the problem. In our project, we cannot use static linking, nor would we want to. I'm happy to help you diagnose the problem if I can. I'm simply trying to use the llvm-cov tool, which works great for static linking, but is having this problem with dynamic linking of all sorts... not just dylib on Mac. I'm on Windows, using DLL, and it also happens. I am using the latest versions of everything, using clang++ compiler, lld-link linker, and of course llvm-cov to process coverage instrumentation. Please advise on this, since it has been 6 years since the last communication here.

@kenkopelson
Copy link

@vedantk I should also say that the report for coverage of my DLL is missing the subdirectories of files in the report. There is a main src directory in the DLL, and then several subdirectories off of it. The report only incudes the classes in the main directory, and none of the ones in the subdirectories.

@qsantos
Copy link

qsantos commented Dec 11, 2023

I am getting this warning in Rust, with cargo-llvm-cov:

~% cargo init --lib cov
~% cd cov
~/cov% cargo add tracing
~/cov% cat src/lib.rs
fn f() {
    tracing::debug!("");
}
~/cov% mkdir tests
~/cov% cat tests/test.rs
#[test]
fn f() {
    tracing::debug!("");
}
~/cov% cargo llvm-cov
…
warning: 1 functions have mismatched data

If that matters, this is on a Apple M3 Pro with Sonoma 14.1 (23B2073).

Edit: it does not seem to happen on Linux (Debian testing)

Note: If tracing-subscriber is in the dependencies, the messages warns about 2 functions with mismatched data.


By expanding the macros, I was able to further reduce it to:

// src/lib.rs
fn f() {
    static __CALLSITE: ::tracing::callsite::DefaultCallsite = {
        static META: ::tracing::Metadata<'static> = ::tracing_core::metadata::Metadata::new(
            "",
            "",
            ::tracing::Level::DEBUG,
            None,
            None,
            None,
            ::tracing_core::field::FieldSet::new(
                &[""],
                ::tracing_core::callsite::Identifier(&__CALLSITE),
            ),
            ::tracing::metadata::Kind::EVENT,
        );
        ::tracing::callsite::DefaultCallsite::new(&META)
    };
}

And

// tests/test.rs
#[test]
pub fn f() {
    // warning: 2 functions have mismatched data
    // let _ = ::tracing::Level::DEBUG <= ::tracing::level_filters::LevelFilter::current();
    // warning: 1 functions have mismatched data
    ::tracing::level_filters::LevelFilter::current();
    // warning: 1 functions have mismatched data
    // let _ = ::tracing::Level::DEBUG <= ::tracing::Level::DEBUG;
}

The minimum working example for ::tracing::level_filters::LevelFilter::current() is shown below. Note that, somehow, having at least one line indented by strictly than more eight spaces matter:

    #[inline(always)]
    pub fn current() -> Self {
        // standard indent
            // non-standard indent
            Self::OFF
    }

@ntnx-aleksa
Copy link

Hello,
I'm facing this problem on Linux (EL8), on tests depending on instrumented shared objects. Symbols with hash mismatches are not present in LCOV output from llvm-cov. What makes things even trickier is that for multiple shared objects specified to llvm-cov, I get completely different LCOV tracefiles as output. The number of symbols present in these files highly varies from order of object file arguments.
llvm-cov reports hash mismatches and that the symbols have a hash of 0, however dumping profdata files with llvm-profdata show --all-functions reveals that the hashes are not 0. Could this be a problem with how llvm-cov reads hashes from indexed profile files or profiling sections in the ELF? Is there some tool that can help inspect profiling sections in ELF files?
Thanks

@ntnx-aleksa
Copy link

I've described my issue further on llvm-discourse

@agent00jackson
Copy link

To others with this issue coming here from Google, -femit-all-decls worked for me as a workaround (mentioned in the above llvm-discourse link).

@krinkinmu
Copy link

Hey folks,

it appears that we've also hit the same issue. I've described my situation in https://discourse.llvm.org/t/source-based-coverage-results-degradation-between-clang-14-and-clang-18/84502/4, and it took a while to connect the dots and find this report.

In our case using -femit-all-decls does not work, as it causes build failure (one specific issue that I hit was inside clang Intel AMX intrinsics, so it's in the libraries coming with the compiler itself). And other workarounds are local (e.g., I can mark affected functions with used attribute or move it from a header to a .cc file, which also seem to work), so they aren't really great because it's difficult to consistently apply them across a large codebase.

What makes things worse for us is that a lot of the coverage machinery is burried under layers of Bazel logic, so it's not straighforward to find this bug in the first place, because the only thing you see is a confusingly wrong coverage report.

I don't really know much about LLVM, but is there something I can help with to make progress and resolve this issue?

It appears that the last attempt to do something about it was #97574 and it didn't exactly work out, given that it's been a while, I'm just looking for ways to make progress on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang:codegen confirmed Verified by a second party coverage tools:llvm-cov
Projects
None yet
Development

No branches or pull requests

8 participants