Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

excessive stack usage with kernel address sanitizer #38157

Open
arndb mannequin opened this issue Sep 3, 2018 · 27 comments
Open

excessive stack usage with kernel address sanitizer #38157

arndb mannequin opened this issue Sep 3, 2018 · 27 comments
Labels
bugzilla Issues migrated from bugzilla compiler-rt:asan Address sanitizer

Comments

@arndb
Copy link
Mannequin

arndb mannequin commented Sep 3, 2018

Bugzilla Link 38809
Version unspecified
OS Linux
Blocks #4440
Attachments linux/drivers/video/backlight/ltv350qv.c, preprocessed, reduced
CC @KernelAddress,@melver,@eugenis,@kcc,@lalozano,@maxim-kuvyrkov,@nathanchance,@nickdesaulniers,@rengolin,@stephenhines

Extended Description

Building the Linux kernel with clang KASAN enabled shows many warnings about possible stack overflow (we limit the frame size per function to 1024 to 2048 byte, depending on configuration, because the per-thread stack is very limited).

I created a reduced test case from one of the scarier warnings:

$ clang-8 ltv350qv.c --target=aarch64-linux-gnu  -c -O2 -Wframe-larger-than=500 -fsanitize=kernel-address  -Wall  -Wno-unused -Wno-sometimes-uninitialized -Werror -mllvm -asan-stack=1 -mllvm -asan-use-after-scope=0
ltv350qv.c:181:6: error: stack frame size of 1760 bytes in function 'fn6' [-Werror,-Wframe-larger-than=]
void fn6() {
     ^
ltv350qv.c:209:6: error: stack frame size of 10048 bytes in function 'fn7' [-Werror,-Wframe-larger-than=]
void fn7() {

I tested this using clang-8.0.0-svn341106-1exp1+020180830200353.1747~1.gbp19b9f6 on Ubuntu, but an old clang-3.9 shows the same behavior.

With gcc, the same function is fine with "asan-use-after-scope" disabled:

$ aarch-linux-gcc-8.0.1 -xc ltv350qv.c   -S -O2 -Wframe-larger-than=100 -fsanitize=kernel-address   -Wall  -Wno-unused -Wno-attributes  -fno-strict-aliasing --param asan-stack=1 -Werror  -fno-sanitize-address-use-after-scope
ltv350qv.c: In function 'fn5':
ltv350qv.c:180:1: error: the frame size of 448 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
 }
 ^
ltv350qv.c: In function 'fn6':
ltv350qv.c:208:1: error: the frame size of 512 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
 }
 ^
cc1: all warnings being treated as errors

but turning on asan-use-after-scope makes gcc as bad as clang, which is expected from the source code (the kernel turns it off by default for this reason):

ltv350qv.c: In function 'fn5':
ltv350qv.c:180:1: error: the frame size of 10000 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
 }
 ^
ltv350qv.c: In function 'fn6':
ltv350qv.c:208:1: error: the frame size of 1984 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
 }

Using -fsantize=address in place of -fsanitize=kernel-address completely avoids the high stack usage with clang, even with asan-use-after-scope enabled:

$ clang-8 ltv350qv.c --target=aarch64-linux-gnu  -c -O2 -Wframe-larger-than=64 -fsanitize=address  -Wall  -Wno-unused -Wno-sometimes-uninitialized -Werror -mllvm -asan-stack=1  -mllvm -asan-use-after-scope=1

ltv350qv.c:181:6: error: stack frame size of 96 bytes in function 'fn6' [-Werror,-Wframe-larger-than=]
void fn6() {
     ^
ltv350qv.c:209:6: error: stack frame size of 96 bytes in function 'fn7' [-Werror,-Wframe-larger-than=]
void fn7() {
@KernelAddress
Copy link
Mannequin

KernelAddress mannequin commented Sep 24, 2018

If I am reading this correctly for fn7 we do 100+x worse than gcc: 10048 for clang, not showing with limit 100 on gcc.

I am also confused by difference between -fsantize=address and -fsanitize=kernel-address. Shouldn't the code be the same except for shadow base?

@arndb
Copy link
Mannequin Author

arndb mannequin commented Sep 24, 2018

Looking at the assembler output, the difference here appears to be that this code from the reduced test case

void fn1(struct list_head *new) {
  union {
    typeof(&a) __c[1];
  } b = {{new}};
  *(volatile long *)b.__c;
}
void fn2(struct list_head *new) { fn1(new); }
void fn3(struct spi_transfer *t) { fn2(&t->transfer_list); }

gets completely optimized away by gcc, but not by clang, which produces a complex stack check from any call to fn3().

This is what it looked like in the original code:

static inline __attribute__((always_inline, unused)) __attribute__((no_instrument_function)) __attribute__((always_inline)) void __write_once_size(volatile void *p, void *res, int size)
{
 switch (size) {
 case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
 case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
 case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
 case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
 default:
  __asm__ __volatile__("" : : : "memory");
  __builtin_memcpy((void *)p, (const void *)res, size);
  __asm__ __volatile__("" : : : "memory");
 }
#define WRITE_ONCE(x, val) \
({                                                      \
        union { typeof(x) __val; char __c[1]; } __u =   \
                { .__val = (__force typeof(x)) (val) }; \ 
        __write_once_size(&(x), __u.__c, sizeof(x));    \
        __u.__val;                                      \ 
})
static inline __attribute__((always_inline, unused)) __attribute__((no_instrument_function)) bool __list_add_valid(struct list_head *new,
    struct list_head *prev,
    struct list_head *next)
{
 return true;
}
static inline __attribute__((always_inline, unused)) __attribute__((no_instrument_function)) void __list_add(struct list_head *new,
         struct list_head *prev,
         struct list_head *next)
{
 if (!__list_add_valid(new, prev, next))
  return;

 next->prev = new;
 new->next = next;
 new->prev = prev;
 WRITE_ONCE(prev->next, new);
}
static inline __attribute__((always_inline, unused)) __attribute__((no_instrument_function)) void list_add_tail(struct list_head *new, struct list_head *head)
{
 __list_add(new, head->prev, head);
}
static inline __attribute__((always_inline, unused)) __attribute__((no_instrument_function)) void
spi_message_add_tail(struct spi_transfer *t, struct spi_message *m)
{
 list_add_tail(&t->transfer_list, &m->transfers);
}

@KernelAddress
Copy link
Mannequin

KernelAddress mannequin commented Sep 24, 2018

Kostya, can you help to find somebody to work on this?

@arndb
Copy link
Mannequin Author

arndb mannequin commented Sep 24, 2018

I've done a new (guided) reduction of the original source code now, which got me to this test case:

struct list_head { struct list_head *next, *prev; };
static void __list_add(struct list_head *new, struct list_head *prev,
           struct list_head *next) {
  prev->next = new;
  next->prev = new;
  new->next = next;
  new->prev = prev;
}
struct spi_transfer {
  struct list_head transfer_list;
  char b[16];
};
struct spi_message {
  struct list_head transfers;
  char b[256];
};
static void
spi_message_add_tail(struct spi_transfer *t, struct spi_message *m) {
  __list_add(&t->transfer_list, &m->transfers, &m->transfers);
}
extern void spi_sync(struct spi_message *msg);
static void ltv350qv_write_reg(void) {
  struct spi_message msg;
  struct spi_transfer index_xfer;
  spi_message_add_tail(&index_xfer, &msg);
  spi_sync(&msg);
}
void ltv350qv_power_on(void) {
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
}

This uses 10944 bytes stack frame with clang -fsanitize=kernel-address, compare with 416 bytes with gcc, or 96 bytes using clang -fsanitize=address.

@eugenis
Copy link
Contributor

eugenis commented Sep 24, 2018

So this is the old problem of stack slot reuse not working with ASan.
In order to find use-after-scope bugs on, for example, "msg" local variable from the first inlined call to ltv350qv_write_reg, it has to stay unused and cannot overlap with the same variable from the following calls.

Simply disabling use-after-scope detection would not help, this is a fundamental part of asan implementation in LLVM.

@arndb
Copy link
Mannequin Author

arndb mannequin commented Sep 24, 2018

So this is the old problem of stack slot reuse not working with ASan.
In order to find use-after-scope bugs on, for example, "msg" local variable
from the first inlined call to ltv350qv_write_reg, it has to stay unused and
cannot overlap with the same variable from the following calls.

Simply disabling use-after-scope detection would not help, this is a
fundamental part of asan implementation in LLVM.

But why does it only affect -fsanitize=kernel-address and not -fsanitize=address then?

Disabling "asan-stack" entirely seems to work, but may be a bit heavy-handed. Are there other sub-options of asan-stack that can be left enabled while we turn off use-after-scope and other options?

For gcc, the kernel already turns off use-after-scope detection to work around the problem, but the normal asan-stack stays enabled.

@eugenis
Copy link
Contributor

eugenis commented Sep 24, 2018

Oh, sorry I misunderstood the question.
Userspace asan has the optional "fake stack" feature for use-after-return detection. Each function checks if it is enabled in prologue. This check makes the 10Kb stack allocation appear to be a dynamic alloca, and it does not trigger the warning. But the problem is still there.

@arndb
Copy link
Mannequin Author

arndb mannequin commented Feb 19, 2019

[preprocessed linuxhttps://user-images.githubusercontent.com/92601431/143758042-76d92e65-601e-4ca0-8fce-b0a8087b6c97.gz)
I'm preparing a patch to turn off asan-stack=1 in the kernel, which gets rid of almost all such warnings. One interesting instance remains in the attached preprocessed file (not reduced yet):

clang-8 -Wno-gnu -Wno-unused-argument -Wno-unused-value -Wno-sign-compare -Wframe-larger-than=1000 -O2 -fsanitize=kernel-address -mllvm -asan-stack=0 -c go7007-fw.i --target=aarch64-linux -Wno-unused-value -Wno-pointer-sign
/git/arm-soc/drivers/media/usb/go7007/go7007-fw.c:1551:5: warning: stack frame size of 2656 bytes in function
'go7007_construct_fw_image' [-Wframe-larger-than=]

Marking one function (do_special()) as attribute((no_inline)) avoids the issue in this file.

@kcc
Copy link
Contributor

kcc commented Feb 20, 2019

I am not confident we can fix this in clang/llvm.
The difference between gcc and clang, AFAICT, is not in the asan
instrumentation, but in inining.
Looking at the reproducer from #38157 #c4,
if I change ltv350qv_write_reg to always inline, gcc also produces a
huge 10K stack frame,
and making it noinline fixes the stack frame for both gcc and clang.

@eugenis
Copy link
Contributor

eugenis commented Feb 20, 2019

The problem with ltv350qv_write_reg could be fixed by tuning inlining heuristics. Right now they assume that inlining an alloca is basically free, while with asan it has a cost that is proportional to its size (for poisoning and unpoisoning, both in code size and run time). This does not address stack frame size directly, but it could be a decent proxy.

@arndb
Copy link
Mannequin Author

arndb mannequin commented Feb 20, 2019

The problem with ltv350qv_write_reg could be fixed by tuning inlining
heuristics. Right now they assume that inlining an alloca is basically free,
while with asan it has a cost that is proportional to its size (for
poisoning and unpoisoning, both in code size and run time). This does not
address stack frame size directly, but it could be a decent proxy.

I don't see an alloca() in the function, in fact we don't allow alloca() to be used in the kernel at all (we build with -Wvla).

I am not confident we can fix this in clang/llvm.
The difference between gcc and clang, AFAICT, is not in the asan
instrumentation, but in inining.
Looking at the reproducer from
#38157 #c4,
if I change ltv350qv_write_reg to always inline, gcc also produces a
huge 10K stack frame,
and making it noinline fixes the stack frame for both gcc and clang.

I tried using attribute((always_inline)) but do not see the gcc bug you are referring to:

https://godbolt.org/z/b7Fm4_

We had a related bug in older versions of gcc that would cause stack slots to not be reused when inlining certain functions, but I added workarounds for all instances in the kernel that hit that particular gcc bug, and newer gcc versions all contain that trivial fix (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c18)

@arndb
Copy link
Mannequin Author

arndb mannequin commented Feb 20, 2019

Here is another example (without inline functions) that I reduced from linux/drivers/crypto/ccp/ccp-ops.c:

https://godbolt.org/z/ylsGSQ

struct ccp_op {
  int jobid[50];
};
struct ccp_op fn1(void), a, b, c;
void fn2(struct ccp_op);
void fn3(int d) {
    fn2(a);
    fn1();
    fn2(b);
    fn2(b);
    fn1();
    fn2(a);
    fn1();
    fn2(b);
    fn2(b);
    fn1();
    fn1();
}

$ clang-8 -std=gnu89 -Wframe-larger-than=2048 --target=aarch64-linux -Wall -fsanitize=kernel-address -O2 -c ccp-ops.i
warning: stack frame size of 3320 bytes in function 'fn3' [-Wframe-larger-than=]
$ gcc-8 -std=gnu89 -Wframe-larger-than=100 -Wall -fsanitize=kernel-address --param asan-stack=1 -O2 -c ccp-ops.i
ccp-ops.i: In function ‘fn3’:
ccp-ops.i:73:1: warning: the frame size of 224 bytes is larger than 100 bytes [-Wframe-larger-than=]

@eugenis
Copy link
Contributor

eugenis commented Feb 21, 2019

Another idea: when use-after-scope is disabled, we could merge allocas with the same size and non-overlapping lifetimes before creating the mega-alloca. This would make asan reports a bit more ambiguous - stack frame dumps will need to mention multiple possibilities for certain offset ranges.

This will take a non-trivial amount of work.

Re: #​11, "alloca" stands for any stack allocation. There are static allocas (regular variables) and dynamic allocas (actual alloca() calls).

@arndb
Copy link
Mannequin Author

arndb mannequin commented Feb 22, 2019

Another idea: when use-after-scope is disabled, we could merge allocas with
the same size and non-overlapping lifetimes before creating the mega-alloca.
This would make asan reports a bit more ambiguous - stack frame dumps will
need to mention multiple possibilities for certain offset ranges.

This will take a non-trivial amount of work.

I would hope something simpler could be done, where the reporting does not have to be made more complicated. Trying to come up with a relevant example that is more like the kernel code I see require much more stack on clang vs gcc, take this program:

// https://godbolt.org/z/Ws2qLZ
struct s { int count; int a[500]; };
int __attribute__((noinline)) may_overflow(struct s *s)
{
        return s->a[s->count];
}
static inline __attribute__((always_inline)) int largestack(int i)
{
        struct s s;
        s.count = i;
        return may_overflow(&s);
}
int main(void)
{
        return largestack(2) +
               largestack(500) + 
               largestack(-2) + 
               largestack(0);
}

Building this with gcc-8 -g -Wall -O2 array.c -fsanitize=address --param asan-stack=1 -fno-sanitize-address-use-after-scope -Wframe-larger-than=10, I see 2KB stack usage and this output:

=================================================================
==3977944==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7ffe8caedbbc at pc 0x000000400a2d bp 0x7ffe8caedb80 sp 0x7ffe8caedb70
READ of size 4 at 0x7ffe8caedbbc thread T0
    #​0 0x400a2c in may_overflow /tmp/array.c:4
    #​1 0x400831 in largestack2 /tmp/array.c:16
    #​2 0x400831 in main /tmp/array.c:20
    #​3 0x7fcd2143282f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #​4 0x400918 in _start (/tmp/a.out+0x400918)

Address 0x7ffe8caedbbc is located in stack of thread T0 at offset 28 in frame
    #​0 0x4007af in main /tmp/array.c:19

  This frame has 1 object(s):
    [32, 2036) 's' <== Memory access at offset 28 underflows this variable

Doing the same thing with clang-8 -g -Wall -O2 array.c -fsanitize=address -mllvm -asan-stack=1 -fno-sanitize-address-use-after-scope -Wframe-larger-than=10, I see 8KB stack usage, and this slightly more elaborate output:

=================================================================
==3977959==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffcfb36c354 at pc 0x0000004f7604 bp 0x7ffcfb36aa80 sp 0x7ffcfb36aa78
READ of size 4 at 0x7ffcfb36c354 thread T0
    #&#8203;0 0x4f7603 in may_overflow /tmp/array2.c:4:16
    #&#8203;1 0x4f7767 in largestack /tmp/array2.c:10:16
    #&#8203;2 0x4f7767 in main /tmp/array2.c:15
    #&#8203;3 0x7fdf0781882f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:291
    #&#8203;4 0x41a8d8 in _start (/tmp/a.out+0x41a8d8)

Address 0x7ffcfb36c354 is located in stack of thread T0 at offset 6324 in frame
    #&#8203;0 0x4f761f in main /tmp/array2.c:13

  This frame has 4 object(s):
    [32, 2036) 's.i12'
    [2176, 4180) 's.i9'
    [4320, 6324) 's.i6' <== Memory access at offset 6324 overflows this variable
    [6464, 8468) 's.i'

But since the four objects are all conceptually the same variable in the inline function, and I still have the line information, it doesn't seem to help much that there are now four objects listed. Is this something that can perhaps be changed more easily?

Note1: the -Wframe-larger-than= warning message seems wrong for "clang --sanitize=address" (it reports 88 bytes instead of 8KB), but it looks correct with either --sanitize=kernel-address or without asan.

Note2: gcc also merges distinct objects of the same size, and reports a random one in the output if they are of the same type. I assume you'd consider that a bug in gcc, but I also think that it's better than using more stack space, at least in the context of the Linux kernel, where the available stack space is very constrained.

@nickdesaulniers
Copy link
Member

I am not confident we can fix this in clang/llvm.
The difference between gcc and clang, AFAICT, is not in the asan
instrumentation, but in inining.
Looking at the reproducer from
#38157 #c4,
if I change ltv350qv_write_reg to always inline, gcc also produces a
huge 10K stack frame,
and making it noinline fixes the stack frame for both gcc and clang.

Would it be worthwhile for us to limit inlining when building with LLVM?

I see a few options in llvm/lib/Analysis/InlineCost.cpp. Perhaps by playing with -mllvm -inline-threshold=<number>, we could reduce the amount of inlining that occurs? I assume that won't break attribute((always_inline)) (when it succeeds), and would help limit the frame sizes from growing too much? Probably won't run as fast, but I think folks are willing to accept that tradeoff.

If we find that that is generally helpful for this very painful and constant source of bugreports, would it be possible for LLVM to simply update that value when using -fsanitize=address?

@KernelAddress
Copy link
Mannequin

KernelAddress mannequin commented Feb 1, 2021

Based on #​4 and #​5 the problem is with stack slot reuse.
Reducing inlining will of course help to some degree in some cases, but at the same time won't address other cases (cheap functions with large stack frames). I would assume the current inlining model doesn't take stack use into account. We could change inlining cost model to include stack use under asan, don't know how much complexity it will add.

@eugenis
Copy link
Contributor

eugenis commented Sep 13, 2021

If it is the warning that's a problem, not the actual stack usage, we could use the dynamic alloca approach in the kernel-address mode, too.

Tweaking the inlining model does not sound very scary to me, and it might help. Worth a try.

@arndb
Copy link
Mannequin Author

arndb mannequin commented Oct 7, 2021

As discussed in a video meeting yesterday, I have done a few builds with my 'randconfig' kernel tree that has bugfixes for known problems, to isolate files that exceed the warning limit with clang but not gcc when using KASAN_STACK, nor when using clang using KASAN but no KASAN_STACK.

For x86_64 and arm64 allmodconfig builds, I got a relatively compact list of affected files that should be easily reproducible using clang-13 on linux-5.15-rc2 or newer with the allmodconfig builds. All warnings that are not in this list have a pending kernel fix or workaround. Listing only one warning per file for the worst cases, anything with more than 3KB compared to less than 2KB for the other builds:

crypto/ecc.c:1276:13: error: stack frame size (3456) exceeds limit (2048) in function 'ecc_point_mult' [-Werror,-Wframe-drivers/gpu/drm/i915/gt/intel_workarounds.c:1608:1: error: stack frame size (3512) exceeds limit (2048) in function 'rcs_engine_wa_init' [-Werror,-Wframe-larger-than]
drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgk104.c:1521:1: error: stack frame size (5856) exceeds limit (2048) in function 'gk104_ram_new_' [-Werror,-Wframe-larger-than]
drivers/hwmon/occ/common.c:1150:5: error: stack frame size (3232) exceeds limit (2048) in function 'occ_setup' [-Werror,-Wframe-larger-than]
drivers/infiniband/hw/ocrdma/ocrdma_stats.c:686:16: error: stack frame size (20704) exceeds limit (2048) in function 'ocrdma_dbgfs_ops_read' [-Werror,-Wframe-larger-than]
drivers/media/common/cx2341x.c:1574:5: error: stack frame size (3072) exceeds limit (2048) in function 'cx2341x_handler_init' [-Werror,-Wframe-larger-than]
drivers/media/i2c/cx25840/cx25840-core.c:2294:12: error: stack frame size (3200) exceeds limit (2048) in function 'cx25840_reset' [-Werror,-Wframe-larger-than]
drivers/media/pci/ddbridge/ddbridge-core.c:2365:6: error: stack frame size (3488) exceeds limit (2048) in function 'ddb_ports_init' [-Werror,-Wframe-larger-than]
drivers/media/usb/gspca/sn9c2028.c:802:12: error: stack frame size (3136) exceeds limit (2048) in function 'sd_start' [-Werror,-Wframe-larger-than]
drivers/misc/lattice-ecp3-config.c:65:13: error: stack frame size (2080) exceeds limit (2048) in function 'firmware_load' [-Werror,-Wframe-larger-than]
drivers/net/wan/slic_ds26522.c:203:12: error: stack frame size (16120) exceeds limit (2048) in function 'slic_ds26522_probe' [-Werror,-Wframe-larger-than]
drivers/net/wireless/ath/ath11k/qmi.c:2695:13: error: stack frame size (4416) exceeds limit (2048) in function 'ath11k_qmi_driver_event_work' [-Werror,-Wframe-larger-than]
drivers/net/wireless/atmel/atmel.c:1305:5: error: stack frame size (5056) exceeds limit (2048) in function 'atmel_open' [-Werror,-Wframe-larger-than]
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:17019:13: error: stack frame size (6400) exceeds limit (2048) in function 'wlc_phy_workarounds_nphy' [-Werror,-Wframe-larger-than]
drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c:855:5: error: stack frame size (4640) exceeds limit (2048) in function 'iwl_mvm_ftm_start' [-Werror,-Wframe-larger-than]
drivers/scsi/qla2xxx/qla_bsg.c:2787:1: error: stack frame size (3200) exceeds limit (2048) in function 'qla2x00_process_vendor_specific' [-Werror,-Wframe-larger-than]
drivers/staging/media/hantro/hantro_g2_hevc_dec.c:536:5: error: stack frame size (3648) exceeds limit (2048) in function 'hantro_g2_hevc_dec_run' [-Werror,-Wframe-larger-than]
drivers/staging/pi433/rf69.c:703:5: error: stack frame size (2080) exceeds limit (2048) in function 'rf69_set_sync_values' [-Werror,-Wframe-larger-than]
drivers/staging/rtl8723bs/core/rtw_security.c:1288:5: error: stack frame size (6592) exceeds limit (2048) in function 'rtw_aes_decrypt' [-Werror,-Wframe-larger-than]
drivers/staging/wlan-ng/cfg80211.c:436:12: error: stack frame size (3904) exceeds limit (2048) in function 'prism2_connect' [-Werror,-Wframe-larger-than]
drivers/usb/misc/sisusbvga/sisusb.c:1878:12: error: stack frame size (4000) exceeds limit (2048) in function 'sisusb_init_gfxcore' [-Werror,-Wframe-larger-than]
drivers/video/fbdev/omap2/omapfb/displays/panel-lgphilips-lb035q02.c:117:12: error: stack frame size (15136) exceeds limit (2048) in function 'lb035q02_connect' [-Werror,-Wframe-larger-than]
drivers/visorbus/visorchipset.c:1499:13: error: stack frame size (3000) exceeds limit (2048) in function 'controlvm_periodic_work' [-Werror,-Wframe-larger-than]
lib/bitfield_kunit.c:60:20: error: stack frame size (12224) exceeds limit (2048) in function 'test_bitfields_constants' [-Werror,-Wframe-larger-than]
lib/test_kasan.c:946:13: error: stack frame size (3224) exceeds limit (2048) in function 'kasan_bitops_generic' [-Werror,-Wframe-larger-than]
lib/test_scanf.c:217:20: error: stack frame size (3704) exceeds limit (2048) in function 'numbers_simple' [-Werror,-Wframe-larger-than]
lib/test_scanf.c:217:20: error: stack frame size (5024) exceeds limit (2048) in function 'numbers_simple' [-Werror,-Wframe-larger-than]
sound/usb/mixer_s1810c.c:543:5: error: stack frame size (2072) exceeds limit (2048) in function 'snd_sc1810_init_mixer' [-Werror,-Wframe-larger-than]

For 32-bit arm, the list was much longer even when applying the same 3KB limit. Looking only at files that are not already in the list above, the worst ones I see are

drivers/media/dvb-frontends/drxd_hard.c:2857:12: error: stack frame size (7584) exceeds limit (1400) in function 'drxd_set_frontend' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/nxt200x.c:519:12: error: stack frame size (6848) exceeds limit (1400) in function 'nxt200x_setup_frontend_parameters' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/nxt200x.c:915:12: error: stack frame size (8192) exceeds limit (1400) in function 'nxt2004_init' [-Werror,-Wframe-larger-than]
drivers/net/usb/r8152.c:7158:13: error: stack frame size (6048) exceeds limit (1400) in function 'r8156_hw_phy_cfg' [-Werror,-Wframe-larger-than]
drivers/net/usb/r8152.c:7503:13: error: stack frame size (8768) exceeds limit (1400) in function 'r8156b_hw_phy_cfg' [-Werror,-Wframe-larger-than]
drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9012:13: error: stack frame size (9952) exceeds limit (1400) in function 'rt2800_init_rfcsr' [-Werror,-Wframe-larger-than]

I also happened to run some more arm32 randconfig builds, which showed additional issues. I can provide .config files for these if anyone wants to take a look, the worst ones here are:

drivers/base/test/property-entry-test.c:117:13: error: stack frame size 5600 exceeds limit 1400 in function 'pe_test_uint_arrays' [-Werror,-Wframe-larger-than]
drivers/gpu/drm/panel/panel-samsung-ld9040.c:239:12: error: stack frame size 5472 exceeds limit 1400 in function 'ld9040_prepare' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/mb86a20s.c:1562:12: error: stack frame size 7584 exceeds limit 1400 in function 'mb86a20s_get_stats' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/nxt200x.c:1087:12: error: stack frame size 17312 exceeds limit 1400 in function 'nxt200x_init' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/stv0367.c:2540:12: error: stack frame size 9696 exceeds limit 1400 in function 'stv0367cab_set_frontend' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/stv090x.c:1976:12: error: stack frame size 12416 exceeds limit 1400 in function 'stv090x_blind_search' [-Werror,-Wframe-larger-than]
drivers/media/i2c/adv7842.c:2853:12: error: stack frame size 15712 exceeds limit 1400 in function 'adv7842_log_status' [-Werror,-Wframe-larger-than]
drivers/media/i2c/cx25840/cx25840-core.c:2294:12: error: stack frame size 5088 exceeds limit 1400 in function 'cx25840_reset' [-Werror,-Wframe-larger-than]
drivers/usb/gadget/udc/max3420_udc.c:841:12: error: stack frame size 9376 exceeds limit 1400 in function 'max3420_thread' [-Werror,-Wframe-larger-than]
drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:185:12: error: stack frame size 10976 exceeds limit 1400 in function 'td028ttec1_panel_enable' [-Werror,-Wframe-larger-than]
drivers/thunderbolt/test.c:2403:13: error: stack frame size 8256 exceeds limit 1400 in function 'tb_test_credit_alloc_all' [-Werror,-Wframe-larger-than]

@arndb
Copy link
Mannequin Author

arndb mannequin commented Oct 7, 2021

To broadly classify the ones I've listed in comment #​18:

  • Most of the warnings are in rarely used and older drivers, which makes them somewhat lower priority to investigate, but it also means that they would not have been caught at runtime

  • Several affected files use the "kunit" framework. This hits a similar problem with gcc when using the CONFIG_GCC_PLUGIN_STRUCTLEAK option, but not when using CONFIG_KASAN_STACK. We have worked around the problem for gcc by turning off the structleak feature for those files. This affects drivers/thunderbolt/test.c,
    drivers/base/test/property-entry-test.c and lib/test_kasan.c among others.

  • There is another cluster in drivers/media/, with functions that (after inlining) have numerous calls to a function such as

int cx25840_write(struct i2c_client *client, u16 addr, u8 value)
{
        u8 buffer[3];

        buffer[0] = addr >> 8;
        buffer[1] = addr & 0xff;
        buffer[2] = value;
        return i2c_master_send(client, buffer, 3);
}

I think this is a case of adding way much more padding around each 3-byte array than gcc does, so this could be addressed for reducing the amount of padding for small stack variables, or it could be addressed by not inlining cx25840_write().
Another option might be if we can annotate i2c_master_send() in a way that tells the compiler that it does not access more than N bytes from the buffer in argument 2, with N being passed as argument 3.

@edwintorok
Copy link
Contributor

mentioned in issue #4440

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@nickdesaulniers
Copy link
Member

Cross referencing downstream issue: ClangBuiltLinux/linux#39

@nickdesaulniers
Copy link
Member

The difference in #38157 (comment) between -fsanitize=address vs -fsanitize=kernel-address in clang seems to be that the former calls __asan_stack_malloc_8 with 10848 if __asan_option_detect_stack_use_after_return is not equal to zero, and has an 10848B alloca otherwise.
https://godbolt.org/z/s9a8s1neG
This looks to me like -Wframe-larger-than=100 is perhaps broken in clang for --target=aarch64-linux-gnu.

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Aug 7, 2023

Oh, it seems neither compiler warns for conditional alloca (with -Wframe-larger-than=): https://godbolt.org/z/dG71KEjqj That's...surprising.

EDIT: seems GCC has -Walloca-larger-than= for this; clang does not.

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Aug 9, 2023

@arndb also mentioned -Wstack-usage= might be of interest to the Linux kernel.

nickdesaulniers pushed a commit that referenced this issue Aug 16, 2023
…argument allocas

This reverts commit e26c24b.

These temporaries are only used in the callee, and their memory can be
reused after the call is complete.

rdar://58552124

Link: #38157
Link: #41896
Link: #43598
Link: ClangBuiltLinux/linux#39
Link: https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D74094
@nickdesaulniers
Copy link
Member

I reapplied https://reviews.llvm.org/rGe698695fbbf62e6676f8907665187f2d2c4d814b to clang-18 which should help slightly (specifically, the cases where structs were passed by value).

@nickdesaulniers
Copy link
Member

reverted (But I don't plan to reopen all of the bugs with the same root cause): #43598 (comment)

razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
…argument allocas

This reverts commit e26c24b.

These temporaries are only used in the callee, and their memory can be
reused after the call is complete.

rdar://58552124

Link: llvm#38157
Link: llvm#41896
Link: llvm#43598
Link: ClangBuiltLinux/linux#39
Link: https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D74094
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
…argument allocas

This reverts commit e26c24b.

These temporaries are only used in the callee, and their memory can be
reused after the call is complete.

rdar://58552124

Link: llvm#38157
Link: llvm#41896
Link: llvm#43598
Link: ClangBuiltLinux/linux#39
Link: https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D74094
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
…argument allocas

This reverts commit e26c24b.

These temporaries are only used in the callee, and their memory can be
reused after the call is complete.

rdar://58552124

Link: llvm#38157
Link: llvm#41896
Link: llvm#43598
Link: ClangBuiltLinux/linux#39
Link: https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D74094
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 3, 2023
…argument allocas

This reverts commit e26c24b.

These temporaries are only used in the callee, and their memory can be
reused after the call is complete.

rdar://58552124

Link: llvm#38157
Link: llvm#41896
Link: llvm#43598
Link: ClangBuiltLinux/linux#39
Link: https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D74094
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 3, 2023
…argument allocas

This reverts commit e26c24b.

These temporaries are only used in the callee, and their memory can be
reused after the call is complete.

rdar://58552124

Link: llvm#38157
Link: llvm#41896
Link: llvm#43598
Link: ClangBuiltLinux/linux#39
Link: https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D74094
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 6, 2023
…argument allocas

This reverts commit e26c24b.

These temporaries are only used in the callee, and their memory can be
reused after the call is complete.

rdar://58552124

Link: llvm#38157
Link: llvm#41896
Link: llvm#43598
Link: ClangBuiltLinux/linux#39
Link: https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D74094
@nickdesaulniers
Copy link
Member

#68746
#68747

razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 11, 2023
…argument allocas

This reverts commit e26c24b.

These temporaries are only used in the callee, and their memory can be
reused after the call is complete.

rdar://58552124

Link: llvm#38157
Link: llvm#41896
Link: llvm#43598
Link: ClangBuiltLinux/linux#39
Link: https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D74094
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla compiler-rt:asan Address sanitizer
Projects
None yet
Development

No branches or pull requests

4 participants