-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86][SSE] Comparison result extractions should use MOVMSK to simplify mask handling #39013
Comments
Generalizing this bug, as its not just masked memory ops that suffer: https://gcc.godbolt.org/z/6qjhy4
llc -mcpu=btver2
|
Some simpler cases: https://godbolt.org/z/sy5EUA
|
This might be a simple extension of: |
We do have that transform in IR already. Do we need it in codegen too? |
And the answer is 'yes' if I'm seeing it correctly. We can't hoist the extract in IR in the more general case in comment 1 because both operands are variables. That requires knowing that extract of element 0 of an FP vector is free. |
|
We may be able to do some of this in SLP by recognising these as allof/anyof reductions of boolean results. |
We're scalarizing now...so current codegen for an AVX target: |
So the question for the comment 1 example is similar to bug 41145 - should we prefer blendv or branches? |
And codegen for the example in the description is now: So I think it's the same question: bitwise select or branch? |
That's for AVX1 targets, the original example was for SSE42 - but v8i16/v16i8 tests on AVX1 has similar issues. |
I suspect the answer in general is: |
Maybe for loads - but for that to work we need to guarantee that the entire range is dereferencable - I'm not sure if masked loads can/should currently fold to regular loads in that case? |
For x86, I think we have to know/prove that the ends of the masked load are actual loads in order to convert the whole thing to a regular load. http://lists.llvm.org/pipermail/llvm-dev/2016-April/097927.html |
Resolving - we do a good job of using movmsk now |
This can replace all the pextrb with a single movmsk to something like:
The text was updated successfully, but these errors were encountered: