In order to link armv4 Linux kernel images with LLD, it's required that LLD support --fix-v4bx, otherwise we get the warning: >> ld.lld: warning: lld uses blx instruction, no object with architecture supporting feature detected IIUC, there's even a relocation type for this: R_ARM_V4BX. https://github.com/ClangBuiltLinux/linux/issues/964
There's two separate issues here: 1. Supporting armv4t. 2. Supporting armv4 without thumb. Supporting armv4t requires some modifications to the relocation handling for calls; that's what the warning is about. Supporting armv4 without thumb means we need to use a different sequence for function returns. Historically, this is a compiler thing, not a linker thing, but R_ARM_V4BX is a trick to allow writing code that works for both armv4t and armv4-without-thumb targets. --fix-v4bx essentially converts arm-mode armv4t code to armv4. If you're not using --fix-v4bx, the linker doesn't really care about the difference between armv4t and armv4; it doesn't generate function returns, anyway.
(The documentation says the support is for >=v6.) Isn't armv4 so obscure that it it quite irrelevant now? Even if it isn't so obscure, are these users happy with their current toolchain and don't consider a switch?
armv5t has both the register and immediate forms of blx, I think. I think the issue here is there are some legacy hardware enthusiasts who build Linux for very old hardware, like https://en.wikipedia.org/wiki/Risc_PC , so legacy configs targeting armv4 persist in the kernel source tree, and randconfig stumbles over them.
There are three classes of ARMv4 (not T) users that I can see from a kernel perspective: - RiscPC/sa110 has only one actual user (rmk) I'm aware of, and it has the added problem that the hardware does not support 32-bit memory accesses, so kernels are actually built for ARMv3 despite this being an ARMv4 processor. gcc-9 removed ARMv3 support a few years ago, and I see no reason to add this to clang. As soon as gcc-9 or higher is required for building kernels, this platform will be removed. - There are a handful of hobbyist users of legacy StrongARM machines that run ARMv4 kernels on SA1100 handhelds or Netwinder. We will probably discuss removing those as well when time is up for RiscPC, but they could remain for a few years longer. - There are two remaining embedded platforms based on Faraday FA526 cores that are deployed commercially in settings that require kernel updates: Cortina Gemini and MOXA Art. The hardware is old, but the kernel ports are well written and maintained, they will probably outlive a number of ARMv4T and ARMv5E platforms we currently support. It would be nice to be able to build kernels for these using clang. For reference, we are still adding support for ARMv5E (ARM926E) SoC platforms, including SoCs designed as recently as 2020 (Microchip SAM9X60). For ARMv4T, we currently support AT91RM9200, S3C24xx, i.MX1, EP72xx, EP93xx, OMAP1, and ARM Integrator based on ARM920T and ARM710T. Debian Buster (released in 2019) dropped support for these, but there is still some work going into the kernel for at least ep93xx, omap1 and s3c24xx, similar to the ARMv4/FA526 platforms.
On the LLD side I originally implemented v7-A and above then later added v5 as there was at least one person had some need for it, I think on the BSD side. It probably wouldn't be a huge amount of extra code and tests to support v4t but up until now there hasn't been anyone needing it in LLD. The assumption was that older hardware would already have toolchains they could use, and we could avoid extra complexity in the code. For v4t support in lld, there are a few things we'd need to do: * We'd need a new v4t thunks as ldr pc, [address] cannot change state in v4t. The sequence becomes: ldr ip, [address] bx ip * As there is no BLX there is no Thunk sharing between Arm and Thumb state. The v4t Thunks isCompatibleWith would need to include only relocations from the same state. * ARM::needsThunk() would need a case for v4t that accounts for the lack of a BLX. I think it would be something like (pseudo code): return isV4t || !inBranchRange(type, branchAddr, dst + a); If the Thunks code has done its job properly then the relocation handling code should never be presented with an opportunity to use a BLX. On the linker side R_ARM_V4BX is trivial to implement (just write Arm state mov pc, lr encoding to the location), but as Eli points out the compiler has to generate the relocation when targeting Arm v4. If there is a need for support then it may be worth approaching Linaro TCWG. I could do it, but I've not got a lot of spare time so it might take a few weeks rather than a few days.
(In reply to Peter Smith from comment #5) > * As there is no BLX there is no Thunk sharing between Arm and Thumb state. > The v4t Thunks isCompatibleWith would need to include only relocations from > the same state. The first thing any v4t Thumb thunk will do is switch to ARM mode. The remaining code is the same, so it could be shared. Not sure that's worth doing, though.
Also, the kernel doesn't actually have any thumb code, so it doesn't actually need v4t thunks, I think. It would be enough to just suppress the warning if there aren't any Thumb symbols.
> The first thing any v4t Thumb thunk will do is switch to ARM mode. The > remaining code is the same, so it could be shared. Not sure that's worth > doing, though. It is possible, although it would mean two entry points for thunk that we'd need to select between per relocation type. My instinct would be to keep it simple unless there was a real need to minimise the code-size. > Also, the kernel doesn't actually have any thumb code, so it doesn't > actually need v4t thunks, I think. It would be enough to just suppress the > warning if there aren't any Thumb symbols. That would be possible, although at the moment that would require LLD to scan all the local symbols of every input object looking for the absence of $t. Although only programs that could trigger the warning would need to run the check. We can't use the BuildAttributes as objects built with -marm still have Tag_THUMB_ISA_use=1 (Thumb instructions were permitted to be used) Would a suppression of the warning be acceptable to the kernel folk? Or are there v4t kernels that use Thumb code that need supporting?
(In reply to Peter Smith from comment #8) > > That would be possible, although at the moment that would require LLD to > scan all the local symbols of every input object looking for the absence of > $t. Although only programs that could trigger the warning would need to run > the check. We can't use the BuildAttributes as objects built with -marm > still have Tag_THUMB_ISA_use=1 (Thumb instructions were permitted to be used) > > Would a suppression of the warning be acceptable to the kernel folk? Or are > there v4t kernels that use Thumb code that need supporting? I'm sure there is no thumb code in pre-v7 kernels that we need to link against. Suppressing the warning would seem to do the trick for armv4t kernels, but if I build for armv4, I still see 'bx' instructions in the thunks. I checked the compiler output from a recent clang and I see that building for armv4 does not generate any 'bx' instructions, so I don't think we actually need the R_ARM_V4BX/--fix-v4bx trick any more to work around compiler-generated instructions, but we still need to work around the linker adding them. Could we just have a linker flag to ask for armv4 style thunks, and use them for both v4 and v4t kernels?
It does look like (ARMInstrInfo.td MOVPCLR https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/ARM/ARMInstrInfo.td#L2443) the return sequence is predicated so clang/llvm will use mov pc, lr for -march=armv4 so in theory no compiler work required. We can tell if all objects are ArmV4 from the build attributes. Wouldn't need a new command line option. Looking at LLD I'd expect that the only thunks needed for an Arm v4 kernel would be long range Arm to Arm. If --pic-veneer is set then these will have bx lr in them. However if --pic-veneer isn't set I'd expect to see all the thunks to be of the form: ldr pc, [pc, #-4] ; l1 l1: .word <destination> In theory these should be OK for v4. May be worth trying a build without --pic-veneer (The kernel wouldn't be position independent though). I think a new PI Thunk for v4 would look something like: ldr ip, [pc, #-8] ; l1 add pc, ip, pc ; not interworking but OK for v4 as there is no Thumb l1: .word destination -4 - l1 ; offset to destination Although if we're adding thunks for armv4, it may just be worth adding v4t thunks at the same time.
(In reply to Peter Smith from comment #10) > It does look like (ARMInstrInfo.td MOVPCLR > https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/ARM/ > ARMInstrInfo.td#L2443) the return sequence is predicated so clang/llvm will > use mov pc, lr for -march=armv4 so in theory no compiler work required. Maybe it was the 'blx mcount' when building with '-pg' that caused the problems? I remember debugging into that in the past, but don't remember the exactly problem and this seems fine now, it always uses 'bl mcount' here with the clang version I see. > We can tell if all objects are ArmV4 from the build attributes. Wouldn't > need a new command line option. > > Looking at LLD I'd expect that the only thunks needed for an Arm v4 kernel > would be long range Arm to Arm. Correct > If --pic-veneer is set then these will have > bx lr in them. However if --pic-veneer isn't set I'd expect to see all the > thunks to be of the form: > ldr pc, [pc, #-4] ; l1 > l1: .word <destination> > > In theory these should be OK for v4. May be worth trying a build without > --pic-veneer (The kernel wouldn't be position independent though). Yes, that works. I see the --pic-veneer flag was added by Ard in https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=8323/1 to work around veneers that are inserted into the early-boot code that is run before the MMU is enabled. We can probably get away with this on actual ARMv4/v4t hardware since their memory is too constrained to actually run a kernel that is large enough to require veneers, but it seems a little wrong. In theory one can run a large (allyesconfig style) kernel with ARMv4 enabled on an ARMv5 machine with a lot of memory. > I think a new PI Thunk for v4 would look something like: > ldr ip, [pc, #-8] ; l1 > add pc, ip, pc ; not interworking but OK for v4 as there is no Thumb > l1: .word destination -4 - l1 ; offset to destination > > Although if we're adding thunks for armv4, it may just be worth adding v4t > thunks at the same time. That would be nice for user space, but in the kernel we don't really care about the distinction. I suggested a command line flag since we could then just unconditionally pick the v4 veneers for linking a v4/v4t kernel and avoid both the bx instructions and the warning.