Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm-mc rejected LDC and STC for coprocessors P10 and P11 in generic coprocessor form #20399

Closed
llvmbot opened this issue Jun 13, 2014 · 21 comments
Assignees
Labels
backend:ARM bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 13, 2014

Bugzilla Link 20025
Resolution FIXED
Resolved on Dec 08, 2015 08:48
Version trunk
OS All
Blocks #20903
Reporter LLVM Bugzilla Contributor
CC @compnerd,@ismail,@jsonn,@rengolin,@RichBarton-Arm,@kaomoneus

Extended Description

llvm-mc rejected LDC and STC instruction for coprocessors P10 and P11
in generic coprocessor form:
LDC p10, ...
STC p11, ...
that used in open source libraries to enable VFP instruction
compilation for more targets.
Library may be compiled without VFP support enabled.

gcc compile this generic coprocessor form.

Problem in ARMAsmParser.cpp
P10 and P11 rejected in
static int MatchCoprocessorOperandName(StringRef Name, char CoprocOp)
// p10 and p11 are invalid for coproc instructions (reserved for FP/NEON)

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jun 13, 2014

assigned to @rengolin

@rengolin
Copy link
Member

rengolin commented Aug 4, 2014

This is what caused half of #20903 's problems.

@rengolin
Copy link
Member

rengolin commented Aug 4, 2014

Duplicating my comments...

So, these co-processor instructions were allowed (and somewhat encouraged) in v5/v6 processors due to the poor nature of VFP support in those days, but v7 and v8 explicitly recommend using vector instructions instead.

I haven't found an explicit prohibition in the ARM ARM (either v7 or v8) but Richard has stated some strong facts about it earlier, so I'm inclined to continue the trend and forbid cp10/cp11 on those architectures.

I'll allow them on v5/v6, but for v7/v8 libc++abi's unwind libraries will have to use vector loads and stores.

@rengolin
Copy link
Member

rengolin commented Aug 5, 2014

This problem is fixed in r214802 for ARMv5 and ARMv6 cores, but not v7 and v8, since in those, access to cp10 and cp11 was deprecated and vector instructions should be used instead.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 22, 2015

Leaving it broken for v7 and v8 is not a sufficient fix. This issue has been affecting musl libc for a long time. For setjmp/longjmp, we need to be able to use forms of vstmia/vldmia that:

  1. Do not cause the object file to be ABI-tagged as needing VFP.
  2. Assemble even on targets that don't have VFP.
  3. Assemble to ARM or Thumb2 properly depending on target.

Right now we're stuck with using .word to meet requirements 1 and 2, which is utterly awful, and trying to find a suitable way to add support for Thumb2-only code generation.

@rengolin
Copy link
Member

Leaving it broken for v7 and v8 is not a sufficient fix.

I don't think this is broken in any way. It's not because GCC does it one way that it's the right way. It's not because user code has gotten cosy with GCC's way that every other compiler has to implement identically.

You're also assuming that, by using another compiler, you should get the exact same results as GCC, thus there is no need for you to change your code, your build system, anything. If that's the case, why change compilers in the first place?

However, if there is a bug, or a complication, we should investigate. But from the reports here, I can't tell if there really is a complication, or you're just expecting Clang to mimic GCC, because you haven't shared the problem, just what you think is the right solution.

  1. Do not cause the object file to be ABI-tagged as needing VFP.
  2. Assemble even on targets that don't have VFP.

Can't you use .arch/.fpu for those cases, like we do on UnwindRegisterRestore/Save.S?

  1. Assemble to ARM or Thumb2 properly depending on target.

I'm not sure what you want here...

If you want to emit Thumb/ARM depending on targets, you can use the -mthumb flag. Shouldn't be too hard to adapt build systems to behave.

Or do you want the same ASM source to be converted into ARM or Thumb?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 22, 2015

GCC is not involved; this is an asm source file. The stc and ldc instructions are documented in the ARM ARM and should be accepted by an assembler.

The problem we have is that all the assemblers are being overly (and in my opinion, erroneously) restrictive about what instructions and what mnemonics they accept. We need to be able to write instructions that will only be executed conditionally at runtime (based on hwcap), and that are accepted no matter what ISA level the user has selected to target. Presently the only way we can do this is with .inst or .word, and we're actually using the latter because some assemblers don't support the former.

Does clang have any option we can pass to the assembler to tell it "accept any instruction that's valid for any ISA level"? That would solve the problem, I think.

Can't you use .arch/.fpu for those cases, like we do on UnwindRegisterRestore/Save.S?

Actually this did not work when I first tried it on clang 3.5, but it's fixed now on clang 3.6. However this also requires some method of suppressing the ABI tag Tag_FP_arch. On gas, setting it to 0 with .eabi_attribute causes it not to be emitted at all, but clang's assembler emits a tag with a value of 0. I'm not sure if this is harmful or not but it's at least mildly undesirable.

Or do you want the same ASM source to be converted into ARM or Thumb?

Yes, we want the same file in UAL to assemble to either ARM or Thumb2. Short of any other solution we might have to switch to preprocessed asm files and #ifdef thumb or whatever and use .word, but I'd rather not do that, because it's the same instruction (just with different encoding) either way.

@rengolin
Copy link
Member

GCC is not involved; this is an asm source file.

Sorry, I meant GNU tools, including the assembler.

The stc and ldc
instructions are documented in the ARM ARM and should be accepted by an
assembler.

Our assembler does support both instructions. However, there was a choice of not supporting CP10/CP11 on ARMv7+ due to its compulsory replacement by VFP/SIMD counterparts, as well as it not being supported in AArch64 (only AArch32).

That was not a strong technical decision, so there is room for change. But it has to be a change that is worth the trouble. I don't want to just start adding things because that's how a lot of broken legacy ended up in (and can't be removed from) the GNU toolchain.

The ARM ARM says three things:

  1. Architectures that implement VFP/SIMD have to provide access to CP10/CP11
  2. ARMv7 tools should favour VFP/SIMD instructions over LPC/STC
  3. AArch64 doesn't support STD/LDC at all, AArch32 (v8) only supports CP10/11/14/15.

The problem we have is that all the assemblers are being overly (and in my
opinion, erroneously) restrictive about what instructions and what mnemonics
they accept.

The documents don't help either. The ARM ARM has a lot of what can be done, but less of what can't, or has to be deprecated (on purpose, to not burden legacy toolchains). They also focus more on the architecture than the tools, which is correct, but doesn't help our predicament. :)

The decision to only include what's explicit in LLVM is exactly to not have to carry more burdens than we need.

We need to be able to write instructions that will only be
executed conditionally at runtime (based on hwcap), and that are accepted no
matter what ISA level the user has selected to target.

This sounds a lot like IFUNC, on an instruction level. While LDC/STC can give you that, its not true for all instructions, and support for multiple architectures on the same binary will have to be controlled some other way.

Does clang have any option we can pass to the assembler to tell it "accept
any instruction that's valid for any ISA level"? That would solve the
problem, I think.

No, and that has been discussed already and the conclusion is that we won't. The GNU assembler can cope with a ridiculous amount of rubbish, and that has created a lot of problems for us and them.

Our stance is that we want to fail as soon as possible, as accurate as possible, and fix the false positives as we go. So far, it has worked well, so there's no reason to stop now.

Actually this did not work when I first tried it on clang 3.5, but it's
fixed now on clang 3.6. However this also requires some method of
suppressing the ABI tag Tag_FP_arch. On gas, setting it to 0 with
.eabi_attribute causes it not to be emitted at all, but clang's assembler
emits a tag with a value of 0. I'm not sure if this is harmful or not but
it's at least mildly undesirable.

That sounds like a bug, either on GAS or our assembler. If that affects you, I'd encourage you to open another bug with some way to reproduce it, so we can investigate what's the expected correct behaviour, and liaise with the GNU folks on solving it the best way on both sides.

Yes, we want the same file in UAL to assemble to either ARM or Thumb2. Short
of any other solution we might have to switch to preprocessed asm files and
#ifdef thumb or whatever and use .word, but I'd rather not do that,
because it's the same instruction (just with different encoding) either way.

I'm not sure you can do that in a generic case. Of course, if you write the assembly by hand, you can be careful to get all the ranges and sizes correct, and only use the instructions that can be encoded on both ARM and T2, but I find that solution problematic, as it's prone to tool misbehaviour.

I'm not sure it's clear, but UAL doesn't mean whatever you write will be encoded in either ARM or Thumb, it just means that the mnemonics, register names and general syntax are the same. So, if you use a large enough immediate, you won't be able to assemble to Thumb, no matter if it's UAL or not.

So, from an engineering point of view, the safest way to proceed is to make sure your files are safe from any compiler/assembler behaviour that implement the correct instruction set. Using #ifdef guards where needed and .arch/.fpu to change local support seems like a sensible way forward.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 22, 2015

That was not a strong technical decision, so there is room for change. But it has to be a change that is worth the trouble.

Right now clang is going out of its way to forbid use of the mnemonic based on -march. Changing this is a simplification.

As it stands, as far as I can tell, ARM is the only target where valid instruction mnemonics that used to be accepted suddenly get rejected with new -march. This is really frustrating because it means anyone writing code is writing to a moving target. There is no issue with the actual instructions not working on the hardware (if there were, old binaries would break) but rather just a gratuitous policy/naming issue.

While LDC/STC can give you that, its not true for all instructions, and support for multiple architectures on the same binary will have to be controlled some other way.

The code in question is part of setjmp/longjmp, and the reason it's needed is that it's possible to have a soft-float (standard EABI) libc but call it from code linked with the "softfp" model (using vfp instructions, but matching standard soft-float EABI calling convention). In this case the call-saved vfp registers are not modified by libc code, so normally we wouldn't have to touch them, but longjmp does need to be able to restore them since the caller's "softfp" code might have touched them between setjmp and longjmp.

Fighting with this whole issue has been a huge waste of time for us. It's really frustrating because all the tools differ in what they accept, and it's very hard to write asm that works on them all, because they're all trying to be uselessly pedantic and impose policy rather than just assembling the mnemonics as documented. :(

@rengolin
Copy link
Member

The code in question is part of setjmp/longjmp, and the reason it's needed
is that it's possible to have a soft-float (standard EABI) libc but call it
from code linked with the "softfp" model (using vfp instructions, but
matching standard soft-float EABI calling convention). In this case the
call-saved vfp registers are not modified by libc code, so normally we
wouldn't have to touch them, but longjmp does need to be able to restore
them since the caller's "softfp" code might have touched them between setjmp
and longjmp.

This is the exact same case as for libunwind, so I don't know why it works there, but not on your case. I can't see any difference in what you describe (and SJLJ in general) that would have a special case where the solution from libunwind won't work.

Fighting with this whole issue has been a huge waste of time for us. It's
really frustrating because all the tools differ in what they accept, and
it's very hard to write asm that works on them all, because they're all
trying to be uselessly pedantic and impose policy rather than just
assembling the mnemonics as documented. :(

I think you're being too harsh. Tools that implement everything every user wants are hard to maintain, document and extend. There's nothing pedantic about reducing maintenance costs.

A month agreeing on a good implementation may be worth a year maintaining a bad one. To users, it may look like a waste of time just because the former is a month of both toolchain engineers and users, while the latter is just toolchain engineers.

Furthermore, you can't expect that the solution you found for your use case is the best solution for all the cases without even looking at them. I don't think it's a good strategy to implement every request without understanding them first, and making sure there is no other way. The GNU folks will take the exact same approach if you ask them to implement a new feature.

Finally, this bug was closed more than a year ago, and we started discussing it again today. So far, we haven't "wasted" more than a few hours. Unfortunately, most of that time was trying to agree on how to develop toolchains, which is not the point here.

So, I recommend you to have a look at the UnwindRegistersRestore/Save.S files and see if that works for you. If not, it would be good to understand why. That library is used from ARMv4 to ARMv8, so if it doesn't work, it's either a bug on the assembler or your code. We just need to figure out which.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 22, 2015

There's nothing pedantic about reducing maintenance costs.

Including extra code to enforce which arbitrary mnemonics you want to accept for which ISA level is not reducing maintenance cost but increasing it. Just accepting the full ISA as defined by the ARM ARM would be so much easier and developers would not have to fight with it.

So far, we haven't "wasted" more than a few hours.

I'm talking about on our side (the musl project). First we were using ldc/stc. Then someone came along and wanted to use clang so we tried the vfp mnemonics, but someone noted that gas didn't accept them, so we used .inst. Then .inst broke old gas so we switched to .word. etc. etc. I've since asked a couple project contributors to research all the different constraints for which mnemonics/directives/options/etc. are supported by which tools in which configurations, trying to find a solution that works everywhere. Many tens of hours have gone into this already between everyone who's worked on the issue.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 22, 2015

So, I recommend you to have a look at the UnwindRegistersRestore/Save

I just looked at that code and I don't see anything it does to suppress the "needs vfp" ABI tag. With that problem solved it would probably be a viable approach.

By using a .arch directive to select an arch that has vfp by default, I'm able to put another .arch directive for armv4t later in the file and that suppresses unwanted ABI tags, but I don't see anywhere this is documented as working. The same does not seem to work for .fpu because there's no ".fpu none" directive as far as I can tell.

BTW the libunwind code seems to have at least one bug: due to conditional compilation of the iwmmx save/restore code, a libunwind that was generated for non-v6 arm will fail to save/restore these registers, even though the caller might be using the. I think that code should be unconditionally compiled and simply used/unused based on hwcap flags.

@rengolin
Copy link
Member

I just looked at that code and I don't see anything it does to suppress the
"needs vfp" ABI tag. With that problem solved it would probably be a viable
approach.

So, I did look into that, and besides having different behaviours, both seem ok, and Clang's seem more correct.

I assume you're talking about:

.eabi_attribute 10, 0

Which disappear if you're using GAS, but in Clang becomes:

  Attribute {
    Tag: 10
    Value: 0
    TagName: FP_arch
    Description: Not Permitted
  }

which is precisely the meaning that the ABI intends it to be.

If that's not what you meant, can you provide a use case?

By using a .arch directive to select an arch that has vfp by default, I'm
able to put another .arch directive for armv4t later in the file and that
suppresses unwanted ABI tags, but I don't see anywhere this is documented as
working.

What kind of document were you expecting?

The same does not seem to work for .fpu because there's no ".fpu
none" directive as far as I can tell.

You can use ".fpu softvfp" to mean no FPU.

BTW the libunwind code seems to have at least one bug: due to conditional
compilation of the iwmmx save/restore code, a libunwind that was generated
for non-v6 arm will fail to save/restore these registers, even though the
caller might be using the. I think that code should be unconditionally
compiled and simply used/unused based on hwcap flags.

That's a separate issue and should be reported separately, but thanks for the heads up. libunwind is tested on multiple architectures by different people, so it's hard to collate all the errors and external patches (if any).

Saleem and Joerg are copied in this bug, they may know more what's the status on older cores.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 5, 2015

Now I'm trying use of the ldc/stc instructions preceded by .arch armv4t, which "should" make them work, but the error is still produced if the arch from the command line is armv7-a or similar. Is this inconsistency intentional?

@rengolin
Copy link
Member

rengolin commented Nov 5, 2015

Now I'm trying use of the ldc/stc instructions preceded by .arch armv4t,
which "should" make them work, but the error is still produced if the arch
from the command line is armv7-a or similar. Is this inconsistency
intentional?

No. .arch, for better or worse, should force the assembler to accept any instruction valid for the requested arch.

Can you send a reduced asm file with the command line you're using?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 5, 2015

Minimal test case:

.arch armv4t
stc p11, cr8, [ip], #​68

Using command: clang --target=arm -march=armv7-a -c -o foo.o foo.s

@rengolin
Copy link
Member

rengolin commented Nov 5, 2015

Minimal test case:

.arch armv4t
stc p11, cr8, [ip], #​68

Using command: clang --target=arm -march=armv7-a -c -o foo.o foo.s

Interesting... I get no error at all... Are you sufficiently close to trunk?

$ clang -mfloat-abi=soft --target=arm -march=armv7-a -c arch.s
$ objdump -d arch.o

arch.o: file format elf32-littlearm

Disassembly of section .text:

00000000 <.text>:
0: ecac8b11 fstmiax ip!, {d8-d15} ;@ Deprecated

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 5, 2015

Using 3.6.2, the latest packaged for my dist. Perhaps it's fixed since then.

@rengolin
Copy link
Member

rengolin commented Nov 6, 2015

Using 3.6.2, the latest packaged for my dist. Perhaps it's fixed since then.

Ah, that explains it. 3.6.2 is based on trunk as of January 2015, and the fix for that went in around Oct, even after 3.7.0 was released, so bad luck.

I think this is a good candidate for back-porting to 3.7.1. The patch is very simple and easy to merge, with zero side effects.

But I don't think we'll do a 3.6.3, so you'd have to move up one release. Would that be feasible?

@rengolin
Copy link
Member

rengolin commented Dec 2, 2015

I haven't heard confirmation, so I didn't merge r214802/r214872 into 3.7.1. We can still keep them as candidates for 3.7.2 or just wait until 3.8.0.

@ismail
Copy link
Contributor

ismail commented Nov 26, 2021

mentioned in issue #20903

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:ARM bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants