[Coroutines] Coroutine frame allocated with a wrong alignment #53148

ezhulenev · 2022-01-12T08:04:56Z

Example: https://godbolt.org/z/Ea8rPzG18

If a value is used across suspension points and it is stored into the coroutine frame, it might have an invalid alignment. It seems that memory for the coroutine frame is allocated with a plain call to operator new(size).

We stumbled upon this problem when generating LLVM IR from MLIR, but it's easy to reproduce it with C++20 coroutines as well.

struct alignas(512) overaligned {
 overaligned() { std::cout << "Consructed: " << ((intptr_t)(this) % 512) << "\n"; }
 ~overaligned() { std::cout << "Destructed: " << ((intptr_t)(this) % 512) << "\n"; }
 int n;
};

void sink(overaligned&) {

cppcoro::generator<const std::uint64_t> fibonacci()
{
  std::uint64_t a = 0, b = 1;
  overaligned st;
  sink(st);
  while (true)
  {
    co_yield b;
    auto tmp = a;
    a = b;
    b += tmp;
    sink(st);
  }
}

output:

Run
Consructed: 0       <--- this is from stack allocated value in main
Consructed: 176   <--- this is from the coroutine

The text was updated successfully, but these errors were encountered:

ChuanqiXu9 · 2022-01-13T01:54:59Z

Yeah, this is a known issue that the alignment requirement couldn't be satisfied if the alignment requirement of elements is large than 16... This issue shows we would better handle this in middle end since there are other users for switch based coroutine intrinsics.

ezhulenev · 2022-01-13T06:46:14Z

In our particular MLIR example the problem is in unaligned vmovaps ymm0

ChuanqiXu9 · 2022-01-13T06:48:20Z

In our particular MLIR example the problem is in unaligned vmovaps ymm0

This is lowered assembly. Could you offer the generated LLVM IR from MLIR?

ezhulenev · 2022-01-13T06:50:06Z

@d0k do you have a LLVM IR from your debugging session?

d0k · 2022-01-13T15:45:51Z

I don't have that IR anymore, but it happens whenever an AVX2 __m256 gets spilled. Adapted the test case from above, segfaults: https://godbolt.org/z/z6jd3P4PM

ChuanqiXu9 · 2022-01-14T02:02:50Z

If it lacks a LLVM IR, I could only assume the problem in the case is same with the problem in C++ (The alignment of elements couldn't large than15). BTW, it should be possible to fix the above problem in frontend. For example, the frontend could call std::new(size_t, align_t) if it detects elements whose alignment is larger than 16. We couldn't do this in clang now since it violates the C++ standard. But I guess it might be possible to do it in MLIR.

ezhulenev · 2022-01-14T08:39:04Z

We don't emit std::new in MLIR, we just emit functions with coroutine intrinsics according to switch-resume lowering, and it's the LLVM coro pass that inserts the call to new, and I assume that this pass has instead to call aligned new, to respect the alignment requirements of captured values.

ChuanqiXu9 · 2022-01-14T08:48:10Z

Oh, it surprises me. I never knew the coro passes would insert call to std::new. If it did, it shouldn't be. Since LLVM shouldn't depend on C++. And I am pretty sure that it's the clang which inserted std::new in case of C++20 coroutines. So I think it would be helpful to look at the LLVM IR generated from MLIR.

d0k · 2022-01-14T11:11:15Z

MLIR calls malloc directly instead of operator new. I guess it could align that block of memory, but to what value?

llvm-project/mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp

Line 370 in d4d0168

auto coroAlloc = rewriter.create<LLVM::CallOp>(

ChuanqiXu9 · 2022-01-17T02:22:54Z

Yeah, it should be the frontend to generate calls to allocation functions (no matter std::new or malloc).

I guess it could align that block of memory, but to what value?

Yeah, now it lacks an intrinsic that the frontend could get the value of the alignment. I would try to provide one recently.

Coroutine lowering always takes the natural alignment when spilling to the frame (issue #53148) so using AVX2 or AVX512 in a coroutine doesn't work. Always overalign to 64 bytes to avoid this issue until we have a better solution. Differential Revision: https://reviews.llvm.org/D117501

ChuanqiXu9 · 2022-01-19T01:54:35Z

Now we sent https://reviews.llvm.org/D117542 to offer llvm.coro.align intrinsic. So we should be able to solve the problem by fulfilling the corresponding alignment to aligned_alloc when emitting LLVM IR.

ChuanqiXu9 · 2022-01-25T11:29:49Z

This should be fixed in: dbbe010

ezhulenev added c++20 llvm:codegen labels Jan 12, 2022

ChuanqiXu9 self-assigned this Jan 12, 2022

ChuanqiXu9 closed this as completed Jan 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Coroutines] Coroutine frame allocated with a wrong alignment #53148

[Coroutines] Coroutine frame allocated with a wrong alignment #53148

ezhulenev commented Jan 12, 2022

ChuanqiXu9 commented Jan 13, 2022

ezhulenev commented Jan 13, 2022

ChuanqiXu9 commented Jan 13, 2022

ezhulenev commented Jan 13, 2022

d0k commented Jan 13, 2022

ChuanqiXu9 commented Jan 14, 2022

ezhulenev commented Jan 14, 2022

ChuanqiXu9 commented Jan 14, 2022

d0k commented Jan 14, 2022

ChuanqiXu9 commented Jan 17, 2022

ChuanqiXu9 commented Jan 19, 2022

ChuanqiXu9 commented Jan 25, 2022

[Coroutines] Coroutine frame allocated with a wrong alignment #53148

[Coroutines] Coroutine frame allocated with a wrong alignment #53148

Comments

ezhulenev commented Jan 12, 2022

ChuanqiXu9 commented Jan 13, 2022

ezhulenev commented Jan 13, 2022

ChuanqiXu9 commented Jan 13, 2022

ezhulenev commented Jan 13, 2022

d0k commented Jan 13, 2022

ChuanqiXu9 commented Jan 14, 2022

ezhulenev commented Jan 14, 2022

ChuanqiXu9 commented Jan 14, 2022

d0k commented Jan 14, 2022

ChuanqiXu9 commented Jan 17, 2022

ChuanqiXu9 commented Jan 19, 2022

ChuanqiXu9 commented Jan 25, 2022