You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On X86, (V)MOVNTPS can be used to do non-temporal vector stores.
However, VMOVNTPS requires that the memory operand for the destination is aligned by 16-bytes (for the 128-bit stores), or 32-bytes (for the 256-bit stores).
In this example, store instructions are marked as 4-bytes aligned.
When the loop vectorizer kicks in, it generates a vector loop body, and all vector stores are correctly annotated with metadata flag "!nontemporal" and aligment 4.
However, on x86 there is no support for unaligned nontemporal stores.
So, ISel falls back to selecting normal (i.e. "temporal") unaligned stores (see the VMOVUPS from the assembly above).
When vectorizing a memcpy-like loop, we should probably check if the target has support for unaligned nontemporal vector stores before transforming the loop. Otherwise, we risk to accidentally introduce temporal stores that pollute the caches.
The text was updated successfully, but these errors were encountered:
Extended Description
Example:
On X86, (V)MOVNTPS can be used to do non-temporal vector stores.
However, VMOVNTPS requires that the memory operand for the destination is aligned by 16-bytes (for the 128-bit stores), or 32-bytes (for the 256-bit stores).
In this example, store instructions are marked as 4-bytes aligned.
When the loop vectorizer kicks in, it generates a vector loop body, and all vector stores are correctly annotated with metadata flag "!nontemporal" and aligment 4.
However, on x86 there is no support for unaligned nontemporal stores.
So, ISel falls back to selecting normal (i.e. "temporal") unaligned stores (see the VMOVUPS from the assembly above).
When vectorizing a memcpy-like loop, we should probably check if the target has support for unaligned nontemporal vector stores before transforming the loop. Otherwise, we risk to accidentally introduce temporal stores that pollute the caches.
The text was updated successfully, but these errors were encountered: