LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 20025 - llvm-mc rejected LDC and STC for coprocessors P10 and P11 in generic coprocessor form
Summary: llvm-mc rejected LDC and STC for coprocessors P10 and P11 in generic coproces...
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: ARM (show other bugs)
Version: trunk
Hardware: PC All
: P normal
Assignee: Renato Golin
URL:
Keywords:
Depends on:
Blocks: 20529
  Show dependency tree
 
Reported: 2014-06-13 02:25 PDT by Andrey Kuharev
Modified: 2015-12-08 08:48 PST (History)
8 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Kuharev 2014-06-13 02:25:11 PDT
llvm-mc rejected LDC and STC instruction for coprocessors P10 and P11
in generic coprocessor form:
LDC	p10,  ...
STC	p11,  ...
that used in open source libraries to enable VFP instruction
compilation for more targets.
Library may be compiled without VFP support enabled.

gcc compile this generic coprocessor form.

Problem in ARMAsmParser.cpp
P10 and P11 rejected in 
static int MatchCoprocessorOperandName(StringRef Name, char CoprocOp)
    // p10 and p11 are invalid for coproc instructions (reserved for FP/NEON)
Comment 1 Renato Golin 2014-08-04 14:19:11 PDT
This is what caused half of PR20529's problems.
Comment 2 Renato Golin 2014-08-04 15:49:58 PDT
Duplicating my comments...

So, these co-processor instructions were allowed (and somewhat encouraged) in v5/v6 processors due to the poor nature of VFP support in those days, but v7 and v8 explicitly recommend using vector instructions instead.

I haven't found an explicit prohibition in the ARM ARM (either v7 or v8) but Richard has stated some strong facts about it earlier, so I'm inclined to continue the trend and forbid cp10/cp11 on those architectures.

I'll allow them on v5/v6, but for v7/v8 libc++abi's unwind libraries will *have* to use vector loads and stores.
Comment 3 Renato Golin 2014-08-04 18:23:39 PDT
This problem is fixed in r214802 for ARMv5 and ARMv6 cores, but not v7 and v8, since in those, access to cp10 and cp11 was deprecated and vector instructions should be used instead.
Comment 4 Rich Felker 2015-10-21 18:35:30 PDT
Leaving it broken for v7 and v8 is not a sufficient fix. This issue has been affecting musl libc for a long time. For setjmp/longjmp, we need to be able to use forms of vstmia/vldmia that:

1. Do not cause the object file to be ABI-tagged as needing VFP.
2. Assemble even on targets that don't have VFP.
3. Assemble to ARM or Thumb2 properly depending on target.

Right now we're stuck with using .word to meet requirements 1 and 2, which is utterly awful, and trying to find a suitable way to add support for Thumb2-only code generation.
Comment 5 Renato Golin 2015-10-22 07:57:05 PDT
(In reply to comment #4)
> Leaving it broken for v7 and v8 is not a sufficient fix.

I don't think this is broken in any way. It's not because GCC does it one way that it's the right way. It's not because user code has gotten cosy with GCC's way that every other compiler has to implement identically.

You're also assuming that, by using another compiler, you should get the exact same results as GCC, thus there is no need for you to change your code, your build system, anything. If that's the case, why change compilers in the first place?

However, if there is a bug, or a complication, we should investigate. But from the reports here, I can't tell if there really is a complication, or you're just expecting Clang to mimic GCC, because you haven't shared the problem, just what you think is the right solution.


> 1. Do not cause the object file to be ABI-tagged as needing VFP.
> 2. Assemble even on targets that don't have VFP.

Can't you use .arch/.fpu for those cases, like we do on UnwindRegisterRestore/Save.S?


> 3. Assemble to ARM or Thumb2 properly depending on target.

I'm not sure what you want here...

If you want to emit Thumb/ARM depending on targets, you can use the -mthumb flag. Shouldn't be too hard to adapt build systems to behave.

Or do you want the same ASM source to be converted into ARM or Thumb?
Comment 6 Rich Felker 2015-10-22 09:50:10 PDT
GCC is not involved; this is an asm source file. The stc and ldc instructions are documented in the ARM ARM and should be accepted by an assembler.

The problem we have is that all the assemblers are being overly (and in my opinion, erroneously) restrictive about what instructions and what mnemonics they accept. We need to be able to write instructions that will only be executed conditionally at runtime (based on hwcap), and that are accepted no matter what ISA level the user has selected to target. Presently the only way we can do this is with .inst or .word, and we're actually using the latter because some assemblers don't support the former.

Does clang have any option we can pass to the assembler to tell it "accept any instruction that's valid for any ISA level"? That would solve the problem, I think.

> Can't you use .arch/.fpu for those cases, like we do on UnwindRegisterRestore/Save.S?

Actually this did not work when I first tried it on clang 3.5, but it's fixed now on clang 3.6. However this also requires some method of suppressing the ABI tag Tag_FP_arch. On gas, setting it to 0 with .eabi_attribute causes it not to be emitted at all, but clang's assembler emits a tag with a value of 0. I'm not sure if this is harmful or not but it's at least mildly undesirable.

> Or do you want the same ASM source to be converted into ARM or Thumb?

Yes, we want the same file in UAL to assemble to either ARM or Thumb2. Short of any other solution we might have to switch to preprocessed asm files and #ifdef __thumb__ or whatever and use .word, but I'd rather not do that, because it's the same instruction (just with different encoding) either way.
Comment 7 Renato Golin 2015-10-22 13:57:14 PDT
(In reply to comment #6)
> GCC is not involved; this is an asm source file.

Sorry, I meant GNU tools, including the assembler.


> The stc and ldc
> instructions are documented in the ARM ARM and should be accepted by an
> assembler.

Our assembler does support both instructions. However, there was a choice of not supporting CP10/CP11 on ARMv7+ due to its compulsory replacement by VFP/SIMD counterparts, as well as it not being supported in AArch64 (only AArch32).

That was not a strong technical decision, so there is room for change. But it has to be a change that is worth the trouble. I don't want to just start adding things because that's how a lot of broken legacy ended up in (and can't be removed from) the GNU toolchain.

The ARM ARM says three things:
 1. Architectures that implement VFP/SIMD *have* to provide access to CP10/CP11
 2. ARMv7 tools should favour VFP/SIMD instructions over LPC/STC
 3. AArch64 doesn't support STD/LDC at all, AArch32 (v8) only supports CP10/11/14/15.


> The problem we have is that all the assemblers are being overly (and in my
> opinion, erroneously) restrictive about what instructions and what mnemonics
> they accept.

The documents don't help either. The ARM ARM has a lot of what can be done, but less of what can't, or has to be deprecated (on purpose, to not burden legacy toolchains). They also focus more on the architecture than the tools, which is correct, but doesn't help our predicament. :)

The decision to only include what's explicit in LLVM is exactly to not have to carry more burdens than we need.


> We need to be able to write instructions that will only be
> executed conditionally at runtime (based on hwcap), and that are accepted no
> matter what ISA level the user has selected to target.

This sounds a lot like IFUNC, on an instruction level. While LDC/STC can give you that, its not true for all instructions, and support for multiple architectures on the same binary will have to be controlled some other way.


> Does clang have any option we can pass to the assembler to tell it "accept
> any instruction that's valid for any ISA level"? That would solve the
> problem, I think.

No, and that has been discussed already and the conclusion is that we won't. The GNU assembler can cope with a ridiculous amount of rubbish, and that has created a lot of problems for us and them.

Our stance is that we want to fail as soon as possible, as accurate as possible, and fix the false positives as we go. So far, it has worked well, so there's no reason to stop now.


> Actually this did not work when I first tried it on clang 3.5, but it's
> fixed now on clang 3.6. However this also requires some method of
> suppressing the ABI tag Tag_FP_arch. On gas, setting it to 0 with
> .eabi_attribute causes it not to be emitted at all, but clang's assembler
> emits a tag with a value of 0. I'm not sure if this is harmful or not but
> it's at least mildly undesirable.

That sounds like a bug, either on GAS or our assembler. If that affects you, I'd encourage you to open another bug with some way to reproduce it, so we can investigate what's the expected correct behaviour, and liaise with the GNU folks on solving it the best way on both sides.


> Yes, we want the same file in UAL to assemble to either ARM or Thumb2. Short
> of any other solution we might have to switch to preprocessed asm files and
> #ifdef __thumb__ or whatever and use .word, but I'd rather not do that,
> because it's the same instruction (just with different encoding) either way.

I'm not sure you can do that in a generic case. Of course, if you write the assembly by hand, you can be careful to get all the ranges and sizes correct, and only use the instructions that can be encoded on both ARM and T2, but I find that solution problematic, as it's prone to tool misbehaviour.

I'm not sure it's clear, but UAL doesn't mean whatever you write will be encoded in either ARM or Thumb, it just means that the mnemonics, register names and general syntax are the same. So, if you use a large enough immediate, you won't be able to assemble to Thumb, no matter if it's UAL or not.

So, from an engineering point of view, the safest way to proceed is to make sure your files are safe from any compiler/assembler behaviour that implement the correct instruction set. Using #ifdef guards where needed and .arch/.fpu to change local support seems like a sensible way forward.
Comment 8 Rich Felker 2015-10-22 15:14:52 PDT
> That was not a strong technical decision, so there is room for change. But it has to be a change that is worth the trouble.

Right now clang is going out of its way to forbid use of the mnemonic based on -march. Changing this is a simplification.

As it stands, as far as I can tell, ARM is the _only_ target where valid instruction mnemonics that used to be accepted suddenly get rejected with new -march. This is really frustrating because it means anyone writing code is writing to a moving target. There is no issue with the actual instructions not working on the hardware (if there were, old binaries would break) but rather just a gratuitous policy/naming issue.

> While LDC/STC can give you that, its not true for all instructions, and support for multiple architectures on the same binary will have to be controlled some other way.

The code in question is part of setjmp/longjmp, and the reason it's needed is that it's possible to have a soft-float (standard EABI) libc but call it from code linked with the "softfp" model (using vfp instructions, but matching standard soft-float EABI calling convention). In this case the call-saved vfp registers are not modified by libc code, so normally we wouldn't have to touch them, but longjmp does need to be able to restore them since the caller's "softfp" code might have touched them between setjmp and longjmp.

Fighting with this whole issue has been a huge waste of time for us. It's really frustrating because all the tools differ in what they accept, and it's very hard to write asm that works on them all, because they're all trying to be uselessly pedantic and impose policy rather than just assembling the mnemonics as documented. :(
Comment 9 Renato Golin 2015-10-22 15:55:33 PDT
(In reply to comment #8)
> The code in question is part of setjmp/longjmp, and the reason it's needed
> is that it's possible to have a soft-float (standard EABI) libc but call it
> from code linked with the "softfp" model (using vfp instructions, but
> matching standard soft-float EABI calling convention). In this case the
> call-saved vfp registers are not modified by libc code, so normally we
> wouldn't have to touch them, but longjmp does need to be able to restore
> them since the caller's "softfp" code might have touched them between setjmp
> and longjmp.

This is the exact same case as for libunwind, so I don't know why it works there, but not on your case. I can't see any difference in what you describe (and SJLJ in general) that would have a special case where the solution from libunwind won't work.


> Fighting with this whole issue has been a huge waste of time for us. It's
> really frustrating because all the tools differ in what they accept, and
> it's very hard to write asm that works on them all, because they're all
> trying to be uselessly pedantic and impose policy rather than just
> assembling the mnemonics as documented. :(

I think you're being too harsh. Tools that implement everything every user wants are hard to maintain, document and extend. There's nothing pedantic about reducing maintenance costs.

A month agreeing on a good implementation may be worth a year maintaining a bad one. To users, it may look like a waste of time just because the former is a month of both toolchain engineers and users, while the latter is just toolchain engineers.

Furthermore, you can't expect that the solution you found for your use case is the best solution for all the cases without even looking at them. I don't think it's a good strategy to implement every request without understanding them first, and making sure there is no other way. The GNU folks will take the *exact* same approach if you ask them to implement a new feature.

Finally, this bug was closed more than a year ago, and we started discussing it again today. So far, we haven't "wasted" more than a few hours. Unfortunately, most of that time was trying to agree on how to develop toolchains, which is not the point here.

So, I recommend you to have a look at the UnwindRegistersRestore/Save.S files and see if that works for you. If not, it would be good to understand why. That library is used from ARMv4 to ARMv8, so if it doesn't work, it's either a bug on the assembler or your code. We just need to figure out which.
Comment 10 Rich Felker 2015-10-22 16:12:30 PDT
> There's nothing pedantic about reducing maintenance costs.

Including extra code to enforce which arbitrary mnemonics you want to accept for which ISA level is not reducing maintenance cost but increasing it. Just accepting the full ISA as defined by the ARM ARM would be so much easier and developers would not have to fight with it.

> So far, we haven't "wasted" more than a few hours.

I'm talking about on our side (the musl project). First we were using ldc/stc. Then someone came along and wanted to use clang so we tried the vfp mnemonics, but someone noted that gas didn't accept them, so we used .inst. Then .inst broke old gas so we switched to .word. etc. etc. I've since asked a couple project contributors to research all the different constraints for which mnemonics/directives/options/etc. are supported by which tools in which configurations, trying to find a solution that works everywhere. Many tens of hours have gone into this already between everyone who's worked on the issue.
Comment 11 Rich Felker 2015-10-22 16:27:10 PDT
> So, I recommend you to have a look at the UnwindRegistersRestore/Save

I just looked at that code and I don't see anything it does to suppress the "needs vfp" ABI tag. With that problem solved it would probably be a viable approach.

By using a .arch directive to select an arch that has vfp by default, I'm able to put another .arch directive for armv4t later in the file and that suppresses unwanted ABI tags, but I don't see anywhere this is documented as working. The same does not seem to work for .fpu because there's no ".fpu none" directive as far as I can tell.

BTW the libunwind code seems to have at least one bug: due to conditional compilation of the iwmmx save/restore code, a libunwind that was generated for non-v6 arm will fail to save/restore these registers, even though the caller might be using the. I think that code should be unconditionally compiled and simply used/unused based on hwcap flags.
Comment 12 Renato Golin 2015-10-23 03:13:20 PDT
(In reply to comment #11)
> I just looked at that code and I don't see anything it does to suppress the
> "needs vfp" ABI tag. With that problem solved it would probably be a viable
> approach.

So, I did look into that, and besides having different behaviours, both seem ok, and Clang's seem more correct.

I assume you're talking about:

.eabi_attribute 10, 0

Which disappear if you're using GAS, but in Clang becomes:

      Attribute {
        Tag: 10
        Value: 0
        TagName: FP_arch
        Description: Not Permitted
      }

which is precisely the meaning that the ABI intends it to be.

If that's not what you meant, can you provide a use case?


> By using a .arch directive to select an arch that has vfp by default, I'm
> able to put another .arch directive for armv4t later in the file and that
> suppresses unwanted ABI tags, but I don't see anywhere this is documented as
> working.

What kind of document were you expecting?


> The same does not seem to work for .fpu because there's no ".fpu
> none" directive as far as I can tell.

You can use ".fpu softvfp" to mean no FPU.


> BTW the libunwind code seems to have at least one bug: due to conditional
> compilation of the iwmmx save/restore code, a libunwind that was generated
> for non-v6 arm will fail to save/restore these registers, even though the
> caller might be using the. I think that code should be unconditionally
> compiled and simply used/unused based on hwcap flags.

That's a separate issue and should be reported separately, but thanks for the heads up. libunwind is tested on multiple architectures by different people, so it's hard to collate all the errors and external patches (if any).

Saleem and Joerg are copied in this bug, they may know more what's the status on older cores.
Comment 13 Rich Felker 2015-11-05 14:29:10 PST
Now I'm trying use of the ldc/stc instructions preceded by .arch armv4t, which "should" make them work, but the error is still produced if the arch from the command line is armv7-a or similar. Is this inconsistency intentional?
Comment 14 Renato Golin 2015-11-05 14:43:19 PST
(In reply to comment #13)
> Now I'm trying use of the ldc/stc instructions preceded by .arch armv4t,
> which "should" make them work, but the error is still produced if the arch
> from the command line is armv7-a or similar. Is this inconsistency
> intentional?

No. .arch, for better or worse, should force the assembler to accept *any* instruction valid for the requested arch.

Can you send a reduced asm file with the command line you're using?
Comment 15 Rich Felker 2015-11-05 14:46:08 PST
Minimal test case:

.arch armv4t
stc p11, cr8, [ip], #68

Using command: clang --target=arm -march=armv7-a -c -o foo.o foo.s
Comment 16 Renato Golin 2015-11-05 14:51:05 PST
(In reply to comment #15)
> Minimal test case:
> 
> .arch armv4t
> stc p11, cr8, [ip], #68
> 
> Using command: clang --target=arm -march=armv7-a -c -o foo.o foo.s

Interesting... I get no error at all... Are you sufficiently close to trunk?

$ clang -mfloat-abi=soft --target=arm -march=armv7-a -c arch.s
$ objdump -d arch.o

arch.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <.text>:
   0:	ecac8b11 	fstmiax	ip!, {d8-d15}	;@ Deprecated
Comment 17 Rich Felker 2015-11-05 15:23:38 PST
Using 3.6.2, the latest packaged for my dist. Perhaps it's fixed since then.
Comment 18 Renato Golin 2015-11-06 06:57:21 PST
(In reply to comment #17)
> Using 3.6.2, the latest packaged for my dist. Perhaps it's fixed since then.

Ah, that explains it. 3.6.2 is based on trunk as of January 2015, and the fix for that went in around Oct, even after 3.7.0 was released, so bad luck.

I think this is a good candidate for back-porting to 3.7.1. The patch is very simple and easy to merge, with zero side effects.

But I don't think we'll do a 3.6.3, so you'd have to move up one release. Would that be feasible?
Comment 19 Renato Golin 2015-12-02 06:46:18 PST
I haven't heard confirmation, so I didn't merge r214802/r214872 into 3.7.1. We can still keep them as candidates for 3.7.2 or just wait until 3.8.0.