-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crash with openGL function glRasterPos2i() when using r600 driver if mesa is built with llvm 3.7 libs #25395
Comments
I tried with the svn version of llvm ( revision 249579 ) and the bug is still here, I will try to find the faulty svn revision where the bug has been introduced |
I didn't manage to find the faulty commit, the bisect process is very slow with my PC ( my CPU is not fast ) and there are a lot of svn revision to test, but I made an interesting discovery : the bug occurs also in a virtual machine ( qemu i686, OS guest : archlinux i686, OS host : archlinux 64 bits, CPU: pentium dual core E6800 ), in this virtual machine it's not the r600 driver who is used, it's the swrast_dri.so file ( 100% emulation software, no 3D acceleration ), in this virtual machine all openGL programs crash ( glxgears for example ), with the error "illegal instruction", this qemu i686 virtual machine runs in my PC ( OS host : archlinux 64 bits, CPU: pentium dual core E6800 ), glxinfo for this qemu VM : name of display: :0 log of Xorg : [ 13.255] (WW) Open ACPI failed (/var/run/acpid.socket) (No such file or directory) [ 13.962] (II) Loading /usr/lib/xorg/modules/drivers/vmware_drv.so the mesa driver is swrast_dri.so, the backtrace is still the same : Starting program: /usr/bin/glxgears Program received signal SIGILL, Illegal instruction. Thread 3 (Thread 0xb3d0db40 (LWP 840)): Thread 2 (Thread 0xb450eb40 (LWP 839)): Thread 1 (Thread 0xb7a5f700 (LWP 838)): it could be a problem in llvm 3.7.0 if there are some faulty code for intel CPU pentium dual core E6800, if I use llvm 3.6.2 lib then there is no bug if mesa 11.0.3 is linked to llvm 3.6.2 lib |
another discovery : in qemu I can set a type of CPU ( pentium, pentium2, pentium2, core2duo, SandyBridge and many more ), you can see the CPUs list with the command "qemu-i386 -cpu ?", until now I used the qemu option "-cpu host", which means that it's the CPU of the host who is emulated ( my pentium dual core E6800 ), then I decided to set a different CPU name in my qemu script : -cpu core2duo -enable-kvm -machine type=pc,accel=kvm -smp 2 with this setting the bug disapears, all is ok in my virtual machine, glxgears and all openGL programs can run without crash, the mesa driver llvmpipe doesn't crash, after that I decided to do set again another CPU in qemu : -cpu Penryn -enable-kvm -machine type=pc,accel=kvm -smp 2 \ with "Penryn" CPU the bug is back in my virtual machine, which means that the bug seems related to the type of CPU, llvm 3.7.0 lib may have a bug when he tries to generate binary code, it fails with some CPUs, this problem doesn't exist with llvm 3.6.2 lib |
I did further tests, I found that llvm 3.7.0 see my CPU pentium dual core like a "Penryn" : $ llc --version | grep CPU but llvm 3.6.2 ( who doesn't have the bug ) see my CPU pentium dual core like a "core2" $ llc --version | grep CPU why this different behaviour ? it could explain this bug if llvm 3.7.0 generates binary code for a wrong cpu ( penryn instead of core 2 ) |
the git commit who has introduced this bug is : the problem is that llvm 3.7.0 treats my pentium dual core as a "penryn", penryn supports SSE4, but not the pentium dual core series ( CPU family 6 model 23 ), the faulty commit has deleted a test about SSE4 : return HasSSE41 ? "penryn" : "core2"; the solution is simply to add this test for CPU family 6 model 23, here is the patch who solves this bug : --- a/lib/Support/Host.cpp 2015-10-14 07:13:52.381374679 +0200
|
add a test about SSE4 for CPU family 6 model 23 this patch solves the bug, because pentium dual core CPUs don't have the SSE4 extension, so they need to be treated as "core2" and not "penryn" |
@Craig Topper: Was the removal of the HasSSE41 conditional intentional? Asking because the commit message only mentions AVX. |
Yes the removal was intentional. If you're autodetecting CPU name you should autodetect CPU features as well using getHostCPUFeatures. The AVX problem was clearly worse because we were downgrading Haswell processors without AVX all the way down to Nehalem. This removed not only AVX support, but BMI, LZCNT, RDRND, etc. It also reverted the scheduling model back to Nehalem as well. This would have kept continuing going forward as there will probably always be CPUs that don't support AVX and we couldn't just keep calling them all Nehalem. The case with SSE41 and penryn is not as severely limiting, but I wanted to cleanup all such behavior. I'm assuming if you run Mesa on a Sandybridge or Haswell that doesn't have AVX you will get other failures because we don't change the CPU name to "corei7". Can Mesa use the getHostCPUFeatures function to fix this completely? |
for now mesa developpers don't use the "getHostCPUFeatures" function, they use their own feature detection : https://bugs.freedesktop.org/show_bug.cgi?id=92214#c32 the problem is that mesa developpers don't really tell to llvm compiler which cpu features must be used, they think that llvm will automatically choose the right cpu features, as I said I think the problem is that my pentium dual core ( cpu family 6 model 23 ) is treated as "penryn" by llvm 3.7.0, and it seems that by default the SSE4 extension is used by llvm compiler when the cpu name is "penryn" and when no cpu features arguments have been passed to the llvm compiler, the problem disapears when I patch the file /lib/Support/Host.cpp in order to change the detection of my pentium dual core by llvm ( "core2" instead of "penryn" ) I don't know if the problem can be solved easily by mesa developpers |
Can you try modifying the Mesa patch from Jose to push "-sse4.1" if sse41 is detected as not supported? |
excellent advice Craig, it works ! it works by adding this to the Jose's patch :
+#else
+#endif
so the logic is now to remove the unsupported CPU features ( like SSE4 ) in the list of cpu features arguments instead of adding the supported features ? this logic is not really natural for a developper who wants to use llvm lib, this developper would think that llvm will never use an unsupported cpu feature if this developper only pass good cpu features to the compiler, that's why I think the main problem is to treat pentium dual core as penryn, penryn cpu has SSE4, but not pentium dual core cpus, and it seems that llvm will try to use by himself SSE4.1 even if the developper didn't add explicitely "+sse4.1" in his source code, in my logic llvm should be more stric, rigorous when he tries to associate a cpu with a cpu name, a cpu name should reflect exactly the cpu features, maybe you should create a new cpuname who targets cpu family 6 model 23 : "dualcore" in order to avoid this SSE4 problem ? |
I must say I find llvm's behavior quite crazy here. Is this really expected? We never tell it the cpu name, llvm figures that out all by itself. Now if it wants to use scheduling model of a given generation of cpu even though that cpu doesn't actually support all features this generation has, that's fine, but that it also implicitly assumes then that all features of this generation are available that doesn't make much sense imho. |
I'll agree the interface isn't ideal. Ideally we'd have a function that returned the CPU name AND the feature list. But since we already had the getHostCPUName function and a separate, but empty and unused getHostCPUFeatures function at the time I fixed the AVX bug I left them separate. As I've said earlier, the CPU name alone is insufficient due the way Intel chooses to enable and disable features within a given CPU family. We would need to add a separate CPU name for every possible combination of features Intel may choose to ship within a given family. The behavior of setCPU without providing an explicit feature list is consistent with what you would get if you passed -march=penryn to gcc or clang. Both would enable sse4.1 instructions. At one point in the past setCPU took "native" (or maybe empty) and did all the autodetection correctly. But all of that code was moved to Host.cpp and drivers were made responsible for calling the detection code before calling setCPU. This unfortunately created the AVX problem. As of right now getHostCPUFeatures will detect every feature the x86 backend is aware of and its output can be used to set mattrs correctly. llc and clang are both doing this if -march=native is passed on their command lines. |
I notice that gcc 5.2.0 sees my pentium dual core as "core2" ( not "penryn" ) if I use "-march=native" : $ gcc -march=native -Q --help=target | grep march and more interesting : in 2013 here in llvm's bugzilla someone has already opened a bug report about the same problem ( his pentium dual core was treaten as penryn instead of core2 by llvm, which triggers bug about SSE4.1 ) : https://llvm.org/bugs/show_bug.cgi?id=16721 Benjamin Kramer fixed the problem by adding the "SSE4.1 test" in /lib/Support/Host.cpp ) : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20130729/182469.html |
|
Turns out gcc doesn't have an -march="penryn". My mistake. So I guess gcc doesn't have a -march option that will corresponds to a CPU with sse4.1, but not sse4.2. |
here are the sse features enabled by default in gcc when -march=native ( aka "core2" with my configuration ) with a pentium dual core cpu : $ gcc -march=native -Q --help=target | grep sse someone has made a interesting suggestion two years ago in order to solve this problem : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20130729/182588.html the idea is to remove SSE4.1 in lib/Target/X86/X86.td file for Penryn, because idealy a family name cpu should reflect the "common" features that ALL cpus share inside this family, it will solve all the problems related to SSE4.1, AVX2 when a CPU doesn't support one of these extensions, because belong to a family of CPU does not mean to share all its features, so a restriction should be make in lib/Target/X86/X86.td for some problematic CPU families like penryn, if we apply this idea it will give this : def : ProcessorModel<"penryn", SandyBridgeModel, and if the developper wants to know the complete features of the CPU host then he can use the function getHostCPUFeatures() in order to set correctly mattrs vector |
Extended Description
I use archlinux 64 bits, graphic card : amd radeon HD4650 Pcie, cpu : intel pentium dual core E6800 3.33 Ghz,
when building mesa 11.0.2 with llvm 3.7.0 version then a bug will occur with the r600 driver when a program uses the openGL function "glRasterPos2i()", for example the test program "tunnel" provided by the mesa-demos package, this program will crash with the error "illegal instruction",
flightgear 3.4 will also crash at startup ( because it uses the glRasterPos2i() function ),
the workaround is to build mesa 11.0.2 with the previous version of LLVM ( 3.6.2 ), there is no bug if LLVM 3.6.2 and llvm-libs 3.6.2 are used during the build of mesa 11.0.2,
I create first a bugreport in mesa website :
https://bugs.freedesktop.org/show_bug.cgi?id=92214
I thought it was mesa 11.0.2 the culprit but in fact it's llvm 3.7.0 the real culprit,
something is wrong in llvm 3.7.0,
I use also gcc-multilib 5.2.0-2, glibc 2.22-3, maybe the combination between glibc 2.22-3 and llvm 3.7.0 is not good
The text was updated successfully, but these errors were encountered: