New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
excessive stack usage for FORTIFY_SOURCE building the kernel #45147
Comments
assigned to @efriedma-quic |
I ran into a warning about exceeding the 2048 byte per-function stack limit on a kernel build with clang-10, and produced a reduced test case: int m; $ clang-10 -Wframe-larger-than=1 -O2 -c curve25519-hacl64-test.c The main culprit here seems to be related to the way that the kernel overrides the memcpy() function when CONFIG_FORTIFY_SOURCE is set, intentionally preventing the memcpy() from being inlined. However, with clang-9 or clang-11, the result is as expected, so this also seems to be a clang-10 specific regression: $ clang-9 -Wframe-larger-than=1 -O2 -c curve25519-hacl64-test.c $ clang-11 -Wframe-larger-than=1 -O2 -c curve25519-hacl64-test.c See also https://godbolt.org/z/Qicn0J for the output |
clang-10 appears to have a broken optimization stage that doesn't enable the compiler to prove at compile time that certain memcpys are within bounds, and thus the outline memcpy is always called, resulting in horrific performance, and in some cases, excessive stack frame growth. Here's a simple reproducer:
Compile this with clang-9 and clang-10 and observe: zx2c4@thinkpad /tmp/curve25519-hacl64-stack-frame-size-test $ clang-10 -Wframe-larger-than=0 -O3 -c b.c -o c10.o Looking at the disassembly of c10.o and c9.o, one can see that c9.o is properly optimized in the obvious way one would expect, while c10.o has blown up and includes extern calls to memcpy. |
The issue was introduced with https://reviews.llvm.org/rGd437fba8ef626b6d8b7928540f630163a9b04021 and fixed by George in commits https://reviews.llvm.org/rG2dd17ff08165e6118e70f00e22b2c36d2d4e0a9a and https://reviews.llvm.org/rG94908088a831141cfbdd15fc5837dccf30cfeeb6. |
Tom, could those two patches be added to 10.0.1? |
For the record, in clang 9, the inline definition of memcpy was ignored by clang, as showcased by this example: https://godbolt.org/z/8D4iDJ Since clang-10, clang honors the inline definition, as illustrated by: https://godbolt.org/z/dmunLs. In the reduced test case proposed, clang-10 does not nonor the inline definition, because it's trivially recursive, and falls back to the memcpy intrinsic. I don't know if that's the intended behavior for the kernel, just wanted to provide more details. |
Eli, you reviewed this patch initially, any objections to merging this along with the follow up fix? https://reviews.llvm.org/rG2dd17ff08165e6118e70f00e22b2c36d2d4e0a9a |
LGTM |
Merged: eaae6df |
I've updated my compiler to the fixed version now, but it seems to run into a related problem in a few other files in the kernel with the new version. This (manually) reduced test case triggers a warning about large stack usage with the fortified memcpy() with clang-10 and clang-11 but not clang-9: $ clang-10 --target=arm-linux-gnueabi -c br_multicast.c -O2 - typedef unsigned int size_t; extern void attribute((noreturn)) fortify_panic(const char *name); extern inline attribute((gnu_inline)) struct br_mcast_stats {
}; struct bridge_mcast_stats { #define barrier() asm volatile("" : : : "memory") typedef struct seqcount { static inline unsigned read_seqcount_begin(const seqcount_t *s) static inline int read_seqcount_retry(const seqcount_t *s, unsigned start) extern void *per_cpu_ptr(void *ptr, unsigned int cpu); static void mcast_stats_add_dir(__u64 *dst, __u64 *src) void br_multicast_get_stats(struct bridge_mcast_stats *stats,
} |
The text was updated successfully, but these errors were encountered: