LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 38809 - excessive stack usage with kernel address sanitizer
Summary: excessive stack usage with kernel address sanitizer
Status: NEW
Alias: None
Product: compiler-rt
Classification: Unclassified
Component: asan (show other bugs)
Version: unspecified
Hardware: PC Linux
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks: 4068
  Show dependency tree
 
Reported: 2018-09-03 02:04 PDT by Arnd Bergmann
Modified: 2021-10-07 06:35 PDT (History)
11 users (show)

See Also:
Fixed By Commit(s):


Attachments
linux/drivers/video/backlight/ltv350qv.c, preprocessed, reduced (3.41 KB, text/x-csrc)
2018-09-03 02:04 PDT, Arnd Bergmann
Details
preprocessed linux/drivers/media/usb/go7007/go7007-fw.c (281.14 KB, application/gzip)
2019-02-19 07:22 PST, Arnd Bergmann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Arnd Bergmann 2018-09-03 02:04:11 PDT
Created attachment 20823 [details]
linux/drivers/video/backlight/ltv350qv.c, preprocessed, reduced

Building the Linux kernel with clang KASAN enabled shows many warnings about possible stack overflow (we limit the frame size per function to 1024 to 2048 byte, depending on configuration, because the per-thread stack is very limited).

I created a reduced test case from one of the scarier warnings:

$ clang-8 ltv350qv.c --target=aarch64-linux-gnu  -c -O2 -Wframe-larger-than=500 -fsanitize=kernel-address  -Wall  -Wno-unused -Wno-sometimes-uninitialized -Werror -mllvm -asan-stack=1 -mllvm -asan-use-after-scope=0
ltv350qv.c:181:6: error: stack frame size of 1760 bytes in function 'fn6' [-Werror,-Wframe-larger-than=]
void fn6() {
     ^
ltv350qv.c:209:6: error: stack frame size of 10048 bytes in function 'fn7' [-Werror,-Wframe-larger-than=]
void fn7() {

I tested this using clang-8.0.0-svn341106-1~exp1+0~20180830200353.1747~1.gbp19b9f6 on Ubuntu, but an old clang-3.9 shows the same behavior.

With gcc, the same function is fine with "asan-use-after-scope" disabled:

$ aarch-linux-gcc-8.0.1 -xc ltv350qv.c   -S -O2 -Wframe-larger-than=100 -fsanitize=kernel-address   -Wall  -Wno-unused -Wno-attributes  -fno-strict-aliasing --param asan-stack=1 -Werror  -fno-sanitize-address-use-after-scope
ltv350qv.c: In function 'fn5':
ltv350qv.c:180:1: error: the frame size of 448 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
 }
 ^
ltv350qv.c: In function 'fn6':
ltv350qv.c:208:1: error: the frame size of 512 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
 }
 ^
cc1: all warnings being treated as errors

but turning on asan-use-after-scope makes gcc as bad as clang, which is expected from the source code (the kernel turns it off by default for this reason):
ltv350qv.c: In function 'fn5':
ltv350qv.c:180:1: error: the frame size of 10000 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
 }
 ^
ltv350qv.c: In function 'fn6':
ltv350qv.c:208:1: error: the frame size of 1984 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
 }

Using -fsantize=address in place of -fsanitize=kernel-address completely avoids the high stack usage with clang, even with asan-use-after-scope enabled:

clang-8 ltv350qv.c --target=aarch64-linux-gnu  -c -O2 -Wframe-larger-than=64 -fsanitize=address  -Wall  -Wno-unused -Wno-sometimes-uninitialized -Werror -mllvm -asan-stack=1  -mllvm -asan-use-after-scope=1

ltv350qv.c:181:6: error: stack frame size of 96 bytes in function 'fn6' [-Werror,-Wframe-larger-than=]
void fn6() {
     ^
ltv350qv.c:209:6: error: stack frame size of 96 bytes in function 'fn7' [-Werror,-Wframe-larger-than=]
void fn7() {
Comment 1 Dmitry Vyukov 2018-09-24 01:02:30 PDT
If I am reading this correctly for fn7 we do 100+x worse than gcc: 10048 for clang, not showing with limit 100 on gcc.

I am also confused by difference between -fsantize=address and -fsanitize=kernel-address. Shouldn't the code be the same except for shadow base?
Comment 2 Arnd Bergmann 2018-09-24 03:59:46 PDT
Looking at the assembler output, the difference here appears to be that this code from the reduced test case

void fn1(struct list_head *new) {
  union {
    typeof(&a) __c[1];
  } b = {{new}};
  *(volatile long *)b.__c;
}
void fn2(struct list_head *new) { fn1(new); }
void fn3(struct spi_transfer *t) { fn2(&t->transfer_list); }

gets completely optimized away by gcc, but not by clang, which produces a complex stack check from any call to fn3().

This is what it looked like in the original code:

static inline __attribute__((always_inline, unused)) __attribute__((no_instrument_function)) __attribute__((always_inline)) void __write_once_size(volatile void *p, void *res, int size)
{
 switch (size) {
 case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
 case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
 case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
 case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
 default:
  __asm__ __volatile__("" : : : "memory");
  __builtin_memcpy((void *)p, (const void *)res, size);
  __asm__ __volatile__("" : : : "memory");
 }
#define WRITE_ONCE(x, val) \
({                                                      \
        union { typeof(x) __val; char __c[1]; } __u =   \
                { .__val = (__force typeof(x)) (val) }; \ 
        __write_once_size(&(x), __u.__c, sizeof(x));    \
        __u.__val;                                      \ 
})
static inline __attribute__((always_inline, unused)) __attribute__((no_instrument_function)) bool __list_add_valid(struct list_head *new,
    struct list_head *prev,
    struct list_head *next)
{
 return true;
}
static inline __attribute__((always_inline, unused)) __attribute__((no_instrument_function)) void __list_add(struct list_head *new,
         struct list_head *prev,
         struct list_head *next)
{
 if (!__list_add_valid(new, prev, next))
  return;

 next->prev = new;
 new->next = next;
 new->prev = prev;
 WRITE_ONCE(prev->next, new);
}
static inline __attribute__((always_inline, unused)) __attribute__((no_instrument_function)) void list_add_tail(struct list_head *new, struct list_head *head)
{
 __list_add(new, head->prev, head);
}
static inline __attribute__((always_inline, unused)) __attribute__((no_instrument_function)) void
spi_message_add_tail(struct spi_transfer *t, struct spi_message *m)
{
 list_add_tail(&t->transfer_list, &m->transfers);
}
Comment 3 Dmitry Vyukov 2018-09-24 04:01:44 PDT
Kostya, can you help to find somebody to work on this?
Comment 4 Arnd Bergmann 2018-09-24 08:58:05 PDT
I've done a new (guided) reduction of the original source code now, which got me to this test case:

struct list_head { struct list_head *next, *prev; };
static void __list_add(struct list_head *new, struct list_head *prev,
           struct list_head *next) {
  prev->next = new;
  next->prev = new;
  new->next = next;
  new->prev = prev;
}
struct spi_transfer {
  struct list_head transfer_list;
  char b[16];
};
struct spi_message {
  struct list_head transfers;
  char b[256];
};
static void
spi_message_add_tail(struct spi_transfer *t, struct spi_message *m) {
  __list_add(&t->transfer_list, &m->transfers, &m->transfers);
}
extern void spi_sync(struct spi_message *msg);
static void ltv350qv_write_reg(void) {
  struct spi_message msg;
  struct spi_transfer index_xfer;
  spi_message_add_tail(&index_xfer, &msg);
  spi_sync(&msg);
}
void ltv350qv_power_on(void) {
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
  ltv350qv_write_reg();
}

This uses 10944 bytes stack frame with clang -fsanitize=kernel-address, compare with 416 bytes with gcc, or 96 bytes using clang -fsanitize=address.
Comment 5 Evgenii Stepanov 2018-09-24 13:44:29 PDT
So this is the old problem of stack slot reuse not working with ASan.
In order to find use-after-scope bugs on, for example, "msg" local variable from the first inlined call to ltv350qv_write_reg, it has to stay unused and cannot overlap with the same variable from the following calls.

Simply disabling use-after-scope detection would not help, this is a fundamental part of asan implementation in LLVM.
Comment 6 Arnd Bergmann 2018-09-24 14:13:23 PDT
(In reply to Evgenii Stepanov from comment #5)
> So this is the old problem of stack slot reuse not working with ASan.
> In order to find use-after-scope bugs on, for example, "msg" local variable
> from the first inlined call to ltv350qv_write_reg, it has to stay unused and
> cannot overlap with the same variable from the following calls.
> 
> Simply disabling use-after-scope detection would not help, this is a
> fundamental part of asan implementation in LLVM.

But why does it only affect -fsanitize=kernel-address and not -fsanitize=address then?

Disabling "asan-stack" entirely seems to work, but may be a bit heavy-handed. Are there other sub-options of asan-stack that can be left enabled while we turn off use-after-scope and other options?

For gcc, the kernel already turns off use-after-scope detection to work around the problem, but the normal asan-stack stays enabled.
Comment 7 Evgenii Stepanov 2018-09-24 14:22:32 PDT
Oh, sorry I misunderstood the question.
Userspace asan has the optional "fake stack" feature for use-after-return detection. Each function checks if it is enabled in prologue. This check makes the 10Kb stack allocation appear to be a dynamic alloca, and it does not trigger the warning. But the problem is still there.
Comment 8 Arnd Bergmann 2019-02-19 07:22:56 PST
Created attachment 21490 [details]
preprocessed linux/drivers/media/usb/go7007/go7007-fw.c

I'm preparing a patch to turn off asan-stack=1 in the kernel, which gets rid of almost all such warnings. One interesting instance remains in the attached preprocessed file (not reduced yet):

clang-8 -Wno-gnu -Wno-unused-argument -Wno-unused-value -Wno-sign-compare  -Wframe-larger-than=1000 -O2  -fsanitize=kernel-address -mllvm -asan-stack=0  -c go7007-fw.i --target=aarch64-linux -Wno-unused-value -Wno-pointer-sign 
/git/arm-soc/drivers/media/usb/go7007/go7007-fw.c:1551:5: warning: stack frame size of 2656 bytes in function
      'go7007_construct_fw_image' [-Wframe-larger-than=]

Marking one function (do_special()) as __attribute__((no_inline)) avoids the issue in this file.
Comment 9 Kostya Serebryany 2019-02-19 16:39:16 PST
I am not confident we can fix this in clang/llvm.
The difference between gcc and clang, AFAICT, is not in the asan
instrumentation, but in inining.
Looking at the reproducer from https://bugs.llvm.org/show_bug.cgi?id=38809#c4,
if I change ltv350qv_write_reg to always inline, gcc also produces a
huge 10K stack frame,
and making it noinline fixes the stack frame for both gcc and clang.
Comment 10 Evgenii Stepanov 2019-02-19 18:10:10 PST
The problem with ltv350qv_write_reg could be fixed by tuning inlining heuristics. Right now they assume that inlining an alloca is basically free, while with asan it has a cost that is proportional to its size (for poisoning and unpoisoning, both in code size and run time). This does not address stack frame size directly, but it could be a decent proxy.
Comment 11 Arnd Bergmann 2019-02-20 00:36:55 PST
(In reply to Evgenii Stepanov from comment #10)
> The problem with ltv350qv_write_reg could be fixed by tuning inlining
> heuristics. Right now they assume that inlining an alloca is basically free,
> while with asan it has a cost that is proportional to its size (for
> poisoning and unpoisoning, both in code size and run time). This does not
> address stack frame size directly, but it could be a decent proxy.

I don't see an alloca() in the function, in fact we don't allow alloca() to be used in the kernel at all (we build with -Wvla). 

(In reply to Kostya Serebryany from comment #9)
> I am not confident we can fix this in clang/llvm.
> The difference between gcc and clang, AFAICT, is not in the asan
> instrumentation, but in inining.
> Looking at the reproducer from
> https://bugs.llvm.org/show_bug.cgi?id=38809#c4,
> if I change ltv350qv_write_reg to always inline, gcc also produces a
> huge 10K stack frame,
> and making it noinline fixes the stack frame for both gcc and clang.

I tried using __attribute__((always_inline)) but do not see the gcc bug you are referring to:

https://godbolt.org/z/b7Fm4_

We had a related bug in older versions of gcc that would cause stack slots to not be reused when inlining certain functions, but I added workarounds for all instances in the kernel that hit that particular gcc bug, and newer gcc versions all contain that trivial fix (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c18)
Comment 12 Arnd Bergmann 2019-02-20 04:20:47 PST
Here is another example (without inline functions) that I reduced from linux/drivers/crypto/ccp/ccp-ops.c:

https://godbolt.org/z/ylsGSQ

struct ccp_op {
  int jobid[50];
};
struct ccp_op fn1(void), a, b, c;
void fn2(struct ccp_op);
void fn3(int d) {
    fn2(a);
    fn1();
    fn2(b);
    fn2(b);
    fn1();
    fn2(a);
    fn1();
    fn2(b);
    fn2(b);
    fn1();
    fn1();
}

$ clang-8 -std=gnu89  -Wframe-larger-than=2048 --target=aarch64-linux -Wall -fsanitize=kernel-address -O2 -c ccp-ops.i 
warning: stack frame size of 3320 bytes in function 'fn3' [-Wframe-larger-than=]
$ gcc-8 -std=gnu89  -Wframe-larger-than=100 -Wall -fsanitize=kernel-address  --param asan-stack=1 -O2 -c ccp-ops.i
ccp-ops.i: In function ‘fn3’:
ccp-ops.i:73:1: warning: the frame size of 224 bytes is larger than 100 bytes [-Wframe-larger-than=]
Comment 13 Evgenii Stepanov 2019-02-20 17:18:58 PST
Another idea: when use-after-scope is disabled, we could merge allocas with the same size and non-overlapping lifetimes before creating the mega-alloca. This would make asan reports a bit more ambiguous - stack frame dumps will need to mention multiple possibilities for certain offset ranges.

This will take a non-trivial amount of work.

Re: #11, "alloca" stands for any stack allocation. There are static allocas (regular variables) and dynamic allocas (actual alloca() calls).
Comment 14 Arnd Bergmann 2019-02-22 09:18:56 PST
(In reply to Evgenii Stepanov from comment #13)
> Another idea: when use-after-scope is disabled, we could merge allocas with
> the same size and non-overlapping lifetimes before creating the mega-alloca.
> This would make asan reports a bit more ambiguous - stack frame dumps will
> need to mention multiple possibilities for certain offset ranges.
> 
> This will take a non-trivial amount of work.

I would hope something simpler could be done, where the reporting does not have to be made more complicated. Trying to come up with a relevant example that is more like the kernel code I see require much more stack on clang vs gcc, take this program: 

// https://godbolt.org/z/Ws2qLZ
struct s { int count; int a[500]; };
int __attribute__((noinline)) may_overflow(struct s *s)
{
        return s->a[s->count];
}
static inline __attribute__((always_inline)) int largestack(int i)
{
        struct s s;
        s.count = i;
        return may_overflow(&s);
}
int main(void)
{
        return largestack(2) +
               largestack(500) + 
               largestack(-2) + 
               largestack(0);
}

Building this with "gcc-8 -g -Wall -O2 array.c -fsanitize=address --param asan-stack=1 -fno-sanitize-address-use-after-scope -Wframe-larger-than=10", I see 2KB stack usage and this output:

=================================================================
==3977944==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7ffe8caedbbc at pc 0x000000400a2d bp 0x7ffe8caedb80 sp 0x7ffe8caedb70
READ of size 4 at 0x7ffe8caedbbc thread T0
    #0 0x400a2c in may_overflow /tmp/array.c:4
    #1 0x400831 in largestack2 /tmp/array.c:16
    #2 0x400831 in main /tmp/array.c:20
    #3 0x7fcd2143282f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #4 0x400918 in _start (/tmp/a.out+0x400918)

Address 0x7ffe8caedbbc is located in stack of thread T0 at offset 28 in frame
    #0 0x4007af in main /tmp/array.c:19

  This frame has 1 object(s):
    [32, 2036) 's' <== Memory access at offset 28 underflows this variable

Doing the same thing with "clang-8 -g -Wall -O2 array.c -fsanitize=address -mllvm -asan-stack=1 -fno-sanitize-address-use-after-scope -Wframe-larger-than=10", I see 8KB stack usage, and this slightly more elaborate output:

=================================================================
==3977959==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffcfb36c354 at pc 0x0000004f7604 bp 0x7ffcfb36aa80 sp 0x7ffcfb36aa78
READ of size 4 at 0x7ffcfb36c354 thread T0
    #0 0x4f7603 in may_overflow /tmp/array2.c:4:16
    #1 0x4f7767 in largestack /tmp/array2.c:10:16
    #2 0x4f7767 in main /tmp/array2.c:15
    #3 0x7fdf0781882f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:291
    #4 0x41a8d8 in _start (/tmp/a.out+0x41a8d8)

Address 0x7ffcfb36c354 is located in stack of thread T0 at offset 6324 in frame
    #0 0x4f761f in main /tmp/array2.c:13

  This frame has 4 object(s):
    [32, 2036) 's.i12'
    [2176, 4180) 's.i9'
    [4320, 6324) 's.i6' <== Memory access at offset 6324 overflows this variable
    [6464, 8468) 's.i'

But since the four objects are all conceptually the same variable in the inline function, and I still have the line information, it doesn't seem to help much that there are now four objects listed. Is this something that can perhaps be changed more easily?

Note1: the -Wframe-larger-than= warning message seems wrong for "clang --sanitize=address" (it reports 88 bytes instead of 8KB), but it looks correct with either --sanitize=kernel-address or without asan.

Note2: gcc also merges distinct objects of the same size, and reports a random one in the output if they are of the same type. I assume you'd consider that a bug in gcc, but I also think that it's better than using more stack space, at least in the context of the Linux kernel, where the available stack space is very constrained.
Comment 15 Nick Desaulniers 2021-01-29 22:56:28 PST
(In reply to Kostya Serebryany from comment #9)
> I am not confident we can fix this in clang/llvm.
> The difference between gcc and clang, AFAICT, is not in the asan
> instrumentation, but in inining.
> Looking at the reproducer from
> https://bugs.llvm.org/show_bug.cgi?id=38809#c4,
> if I change ltv350qv_write_reg to always inline, gcc also produces a
> huge 10K stack frame,
> and making it noinline fixes the stack frame for both gcc and clang.

Would it be worthwhile for us to limit inlining when building with LLVM?

I see a few options in llvm/lib/Analysis/InlineCost.cpp.  Perhaps by playing with `-mllvm -inline-threshold=<number>`, we could reduce the amount of inlining that occurs?  I assume that won't break __attribute__((always_inline)) (when it succeeds), and would help limit the frame sizes from growing too much?  Probably won't run as fast, but I think folks are willing to accept that tradeoff.

If we find that that is generally helpful for this very painful and constant source of bugreports, would it be possible for LLVM to simply update that value when using -fsanitize=address?
Comment 16 Dmitry Vyukov 2021-02-01 00:15:04 PST
Based on #4 and #5 the problem is with stack slot reuse.
Reducing inlining will of course help to some degree in some cases, but at the same time won't address other cases (cheap functions with large stack frames). I would assume the current inlining model doesn't take stack use into account. We could change inlining cost model to include stack use under asan, don't know how much complexity it will add.
Comment 17 Evgenii Stepanov 2021-09-13 14:42:14 PDT
If it is the warning that's a problem, not the actual stack usage, we could use the dynamic alloca approach in the kernel-address mode, too.

Tweaking the inlining model does not sound very scary to me, and it might help. Worth a try.
Comment 18 Arnd Bergmann 2021-10-07 03:53:52 PDT
As discussed in a video meeting yesterday, I have done a few builds with my 'randconfig' kernel tree that has bugfixes for known problems, to isolate files that exceed the warning limit with clang but not gcc when using KASAN_STACK, nor when using clang using KASAN but no KASAN_STACK.

For x86_64 and arm64 allmodconfig builds, I got a relatively compact list of affected files that should be easily reproducible using clang-13 on linux-5.15-rc2 or newer with the allmodconfig builds. All warnings that are not in this list have a pending kernel fix or workaround. Listing only one warning per file for the worst cases, anything with more than 3KB compared to less than 2KB for the other builds:

crypto/ecc.c:1276:13: error: stack frame size (3456) exceeds limit (2048) in function 'ecc_point_mult' [-Werror,-Wframe-drivers/gpu/drm/i915/gt/intel_workarounds.c:1608:1: error: stack frame size (3512) exceeds limit (2048) in function 'rcs_engine_wa_init' [-Werror,-Wframe-larger-than]
drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgk104.c:1521:1: error: stack frame size (5856) exceeds limit (2048) in function 'gk104_ram_new_' [-Werror,-Wframe-larger-than]
drivers/hwmon/occ/common.c:1150:5: error: stack frame size (3232) exceeds limit (2048) in function 'occ_setup' [-Werror,-Wframe-larger-than]
drivers/infiniband/hw/ocrdma/ocrdma_stats.c:686:16: error: stack frame size (20704) exceeds limit (2048) in function 'ocrdma_dbgfs_ops_read' [-Werror,-Wframe-larger-than]
drivers/media/common/cx2341x.c:1574:5: error: stack frame size (3072) exceeds limit (2048) in function 'cx2341x_handler_init' [-Werror,-Wframe-larger-than]
drivers/media/i2c/cx25840/cx25840-core.c:2294:12: error: stack frame size (3200) exceeds limit (2048) in function 'cx25840_reset' [-Werror,-Wframe-larger-than]
drivers/media/pci/ddbridge/ddbridge-core.c:2365:6: error: stack frame size (3488) exceeds limit (2048) in function 'ddb_ports_init' [-Werror,-Wframe-larger-than]
drivers/media/usb/gspca/sn9c2028.c:802:12: error: stack frame size (3136) exceeds limit (2048) in function 'sd_start' [-Werror,-Wframe-larger-than]
drivers/misc/lattice-ecp3-config.c:65:13: error: stack frame size (2080) exceeds limit (2048) in function 'firmware_load' [-Werror,-Wframe-larger-than]
drivers/net/wan/slic_ds26522.c:203:12: error: stack frame size (16120) exceeds limit (2048) in function 'slic_ds26522_probe' [-Werror,-Wframe-larger-than]
drivers/net/wireless/ath/ath11k/qmi.c:2695:13: error: stack frame size (4416) exceeds limit (2048) in function 'ath11k_qmi_driver_event_work' [-Werror,-Wframe-larger-than]
drivers/net/wireless/atmel/atmel.c:1305:5: error: stack frame size (5056) exceeds limit (2048) in function 'atmel_open' [-Werror,-Wframe-larger-than]
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:17019:13: error: stack frame size (6400) exceeds limit (2048) in function 'wlc_phy_workarounds_nphy' [-Werror,-Wframe-larger-than]
drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c:855:5: error: stack frame size (4640) exceeds limit (2048) in function 'iwl_mvm_ftm_start' [-Werror,-Wframe-larger-than]
drivers/scsi/qla2xxx/qla_bsg.c:2787:1: error: stack frame size (3200) exceeds limit (2048) in function 'qla2x00_process_vendor_specific' [-Werror,-Wframe-larger-than]
drivers/staging/media/hantro/hantro_g2_hevc_dec.c:536:5: error: stack frame size (3648) exceeds limit (2048) in function 'hantro_g2_hevc_dec_run' [-Werror,-Wframe-larger-than]
drivers/staging/pi433/rf69.c:703:5: error: stack frame size (2080) exceeds limit (2048) in function 'rf69_set_sync_values' [-Werror,-Wframe-larger-than]
drivers/staging/rtl8723bs/core/rtw_security.c:1288:5: error: stack frame size (6592) exceeds limit (2048) in function 'rtw_aes_decrypt' [-Werror,-Wframe-larger-than]
drivers/staging/wlan-ng/cfg80211.c:436:12: error: stack frame size (3904) exceeds limit (2048) in function 'prism2_connect' [-Werror,-Wframe-larger-than]
drivers/usb/misc/sisusbvga/sisusb.c:1878:12: error: stack frame size (4000) exceeds limit (2048) in function 'sisusb_init_gfxcore' [-Werror,-Wframe-larger-than]
drivers/video/fbdev/omap2/omapfb/displays/panel-lgphilips-lb035q02.c:117:12: error: stack frame size (15136) exceeds limit (2048) in function 'lb035q02_connect' [-Werror,-Wframe-larger-than]
drivers/visorbus/visorchipset.c:1499:13: error: stack frame size (3000) exceeds limit (2048) in function 'controlvm_periodic_work' [-Werror,-Wframe-larger-than]
lib/bitfield_kunit.c:60:20: error: stack frame size (12224) exceeds limit (2048) in function 'test_bitfields_constants' [-Werror,-Wframe-larger-than]
lib/test_kasan.c:946:13: error: stack frame size (3224) exceeds limit (2048) in function 'kasan_bitops_generic' [-Werror,-Wframe-larger-than]
lib/test_scanf.c:217:20: error: stack frame size (3704) exceeds limit (2048) in function 'numbers_simple' [-Werror,-Wframe-larger-than]
lib/test_scanf.c:217:20: error: stack frame size (5024) exceeds limit (2048) in function 'numbers_simple' [-Werror,-Wframe-larger-than]
sound/usb/mixer_s1810c.c:543:5: error: stack frame size (2072) exceeds limit (2048) in function 'snd_sc1810_init_mixer' [-Werror,-Wframe-larger-than]


For 32-bit arm, the list was much longer even when applying the same 3KB limit. Looking only at files that are not already in the list above, the worst ones I see are 

drivers/media/dvb-frontends/drxd_hard.c:2857:12: error: stack frame size (7584) exceeds limit (1400) in function 'drxd_set_frontend' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/nxt200x.c:519:12: error: stack frame size (6848) exceeds limit (1400) in function 'nxt200x_setup_frontend_parameters' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/nxt200x.c:915:12: error: stack frame size (8192) exceeds limit (1400) in function 'nxt2004_init' [-Werror,-Wframe-larger-than]
drivers/net/usb/r8152.c:7158:13: error: stack frame size (6048) exceeds limit (1400) in function 'r8156_hw_phy_cfg' [-Werror,-Wframe-larger-than]
drivers/net/usb/r8152.c:7503:13: error: stack frame size (8768) exceeds limit (1400) in function 'r8156b_hw_phy_cfg' [-Werror,-Wframe-larger-than]
drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9012:13: error: stack frame size (9952) exceeds limit (1400) in function 'rt2800_init_rfcsr' [-Werror,-Wframe-larger-than]


I also happened to run some more arm32 randconfig builds, which showed additional issues. I can provide .config files for these if anyone wants to take a look, the worst ones here are:

drivers/base/test/property-entry-test.c:117:13: error: stack frame size  5600  exceeds limit  1400  in function 'pe_test_uint_arrays' [-Werror,-Wframe-larger-than]
drivers/gpu/drm/panel/panel-samsung-ld9040.c:239:12: error: stack frame size  5472  exceeds limit  1400  in function 'ld9040_prepare' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/mb86a20s.c:1562:12: error: stack frame size  7584  exceeds limit  1400  in function 'mb86a20s_get_stats' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/nxt200x.c:1087:12: error: stack frame size  17312  exceeds limit  1400  in function 'nxt200x_init' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/stv0367.c:2540:12: error: stack frame size  9696  exceeds limit  1400  in function 'stv0367cab_set_frontend' [-Werror,-Wframe-larger-than]
drivers/media/dvb-frontends/stv090x.c:1976:12: error: stack frame size  12416  exceeds limit  1400  in function 'stv090x_blind_search' [-Werror,-Wframe-larger-than]
drivers/media/i2c/adv7842.c:2853:12: error: stack frame size  15712  exceeds limit  1400  in function 'adv7842_log_status' [-Werror,-Wframe-larger-than]
drivers/media/i2c/cx25840/cx25840-core.c:2294:12: error: stack frame size  5088  exceeds limit  1400  in function 'cx25840_reset' [-Werror,-Wframe-larger-than]
drivers/usb/gadget/udc/max3420_udc.c:841:12: error: stack frame size  9376  exceeds limit  1400  in function 'max3420_thread' [-Werror,-Wframe-larger-than]
drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:185:12: error: stack frame size  10976  exceeds limit  1400  in function 'td028ttec1_panel_enable' [-Werror,-Wframe-larger-than]
drivers/thunderbolt/test.c:2403:13: error: stack frame size  8256  exceeds limit  1400  in function 'tb_test_credit_alloc_all' [-Werror,-Wframe-larger-than]
Comment 19 Arnd Bergmann 2021-10-07 06:35:47 PDT
To broadly classify the ones I've listed in comment #18:

- Most of the warnings are in rarely used and older drivers, which makes them somewhat lower priority to investigate, but it also means that they would not have been caught at runtime

- Several affected files use the "kunit" framework. This hits a similar problem with gcc when using the CONFIG_GCC_PLUGIN_STRUCTLEAK option, but not when using CONFIG_KASAN_STACK. We have worked around the problem for gcc by turning off the structleak feature for those files. This affects drivers/thunderbolt/test.c, 
drivers/base/test/property-entry-test.c and lib/test_kasan.c among others.

- There is another cluster in drivers/media/, with functions that (after inlining) have numerous calls to a function such as

int cx25840_write(struct i2c_client *client, u16 addr, u8 value)
{
        u8 buffer[3];

        buffer[0] = addr >> 8;
        buffer[1] = addr & 0xff;
        buffer[2] = value;
        return i2c_master_send(client, buffer, 3);
}

I think this is a case of adding way much more padding around each 3-byte array than gcc does, so this could be addressed for reducing the amount of padding for small stack variables, or it could be addressed by not inlining cx25840_write().
Another option might be if we can annotate i2c_master_send() in a way that tells the compiler that it does not access more than N bytes from the buffer in argument 2, with N being passed as argument 3.