Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Coroutines] Coroutine frame allocated with a wrong alignment #53148

Closed
ezhulenev opened this issue Jan 12, 2022 · 12 comments
Closed

[Coroutines] Coroutine frame allocated with a wrong alignment #53148

ezhulenev opened this issue Jan 12, 2022 · 12 comments
Assignees

Comments

@ezhulenev
Copy link
Member

Example: https://godbolt.org/z/Ea8rPzG18

If a value is used across suspension points and it is stored into the coroutine frame, it might have an invalid alignment. It seems that memory for the coroutine frame is allocated with a plain call to operator new(size).

We stumbled upon this problem when generating LLVM IR from MLIR, but it's easy to reproduce it with C++20 coroutines as well.

struct alignas(512) overaligned {
 overaligned() { std::cout << "Consructed: " << ((intptr_t)(this) % 512) << "\n"; }
 ~overaligned() { std::cout << "Destructed: " << ((intptr_t)(this) % 512) << "\n"; }
 int n;
};

void sink(overaligned&) {

cppcoro::generator<const std::uint64_t> fibonacci()
{
  std::uint64_t a = 0, b = 1;
  overaligned st;
  sink(st);
  while (true)
  {
    co_yield b;
    auto tmp = a;
    a = b;
    b += tmp;
    sink(st);
  }
}

output:

Run
Consructed: 0       <--- this is from stack allocated value in main
Consructed: 176   <--- this is from the coroutine
@ChuanqiXu9
Copy link
Member

Yeah, this is a known issue that the alignment requirement couldn't be satisfied if the alignment requirement of elements is large than 16... This issue shows we would better handle this in middle end since there are other users for switch based coroutine intrinsics.

@ezhulenev
Copy link
Member Author

In our particular MLIR example the problem is in unaligned vmovaps ymm0

@ChuanqiXu9
Copy link
Member

In our particular MLIR example the problem is in unaligned vmovaps ymm0

This is lowered assembly. Could you offer the generated LLVM IR from MLIR?

@ezhulenev
Copy link
Member Author

@d0k do you have a LLVM IR from your debugging session?

@d0k
Copy link
Member

d0k commented Jan 13, 2022

I don't have that IR anymore, but it happens whenever an AVX2 __m256 gets spilled. Adapted the test case from above, segfaults: https://godbolt.org/z/z6jd3P4PM

@ChuanqiXu9
Copy link
Member

If it lacks a LLVM IR, I could only assume the problem in the case is same with the problem in C++ (The alignment of elements couldn't large than15). BTW, it should be possible to fix the above problem in frontend. For example, the frontend could call std::new(size_t, align_t) if it detects elements whose alignment is larger than 16. We couldn't do this in clang now since it violates the C++ standard. But I guess it might be possible to do it in MLIR.

@ezhulenev
Copy link
Member Author

We don't emit std::new in MLIR, we just emit functions with coroutine intrinsics according to switch-resume lowering, and it's the LLVM coro pass that inserts the call to new, and I assume that this pass has instead to call aligned new, to respect the alignment requirements of captured values.

@ChuanqiXu9
Copy link
Member

Oh, it surprises me. I never knew the coro passes would insert call to std::new. If it did, it shouldn't be. Since LLVM shouldn't depend on C++. And I am pretty sure that it's the clang which inserted std::new in case of C++20 coroutines. So I think it would be helpful to look at the LLVM IR generated from MLIR.

@d0k
Copy link
Member

d0k commented Jan 14, 2022

MLIR calls malloc directly instead of operator new. I guess it could align that block of memory, but to what value?

auto coroAlloc = rewriter.create<LLVM::CallOp>(

@ChuanqiXu9
Copy link
Member

Yeah, it should be the frontend to generate calls to allocation functions (no matter std::new or malloc).

I guess it could align that block of memory, but to what value?

Yeah, now it lacks an intrinsic that the frontend could get the value of the alignment. I would try to provide one recently.

d0k added a commit that referenced this issue Jan 17, 2022
Coroutine lowering always takes the natural alignment when spilling to
the frame (issue #53148) so using AVX2 or AVX512 in a coroutine doesn't
work. Always overalign to 64 bytes to avoid this issue until we have a
better solution.

Differential Revision: https://reviews.llvm.org/D117501
@ChuanqiXu9
Copy link
Member

Now we sent https://reviews.llvm.org/D117542 to offer llvm.coro.align intrinsic. So we should be able to solve the problem by fulfilling the corresponding alignment to aligned_alloc when emitting LLVM IR.

@ChuanqiXu9
Copy link
Member

This should be fixed in: dbbe010

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants