Bugzilla – Bug 761
TargetData needs to be parameterized better in the .bc/.ll files
Last modified: 2007-01-27 13:24:58
You need to log in before you can comment on or make changes to this bug.
Currently, when 'opt' is run, it configures and adds a TargetData to the pass manager based on the settings of the "target" flags in the Module. These flags indicate whether the target is 32/64 bit, and whether it is big or little endian. This has two problems: 1. If the module has no target info, we default to SparcV9 (this is Bug 760). 2. Our mapping from ptrsize/endianness to a full targetdata is a total hack. #2 is true because a TargetData has many more fields than just ptrsize and endianness, thus the mapping from these two fields to a full TargetData is inherently a few-to-many mapping, which is impossible to be right in the general case (we have it carefully rigged to work for our current targets though). Instead of encoding ptrsize/endianness into the module, it would be far better to encode a whole target data into it. I propose that we do this defining a "default" target data, then encoding any diffs from this into the module as a string. For example, if the default was a 64-bit, big endian, target with 8-byte aligned doubles, a target like X86 could be described as "p3eLd4". As TargetData is extended in the future, more letters could be gracefully added without breaking old .bc files. -Chris
Oh yeah, TargetData should have a ctor that takes the string form, and should have a method that produces it from a given TargetData. Among other things (e.g. correctness on new targets), this would allow (for example) the TargetMachine ctors to pass in a short constant string to their TargetData, instead of passing 50 random options. -Chris
Owen, is this done or nearly done?
After mistakenly trying to implement bc read/write for this, I had to back it out because there's a lot more work to be done: 1. The bcwriter needs to get rid of calls to getEndianness and getPointerSize and not set that information in the module header. Instead it should call getDataLayout and write that string. 2. The reader needs to be updated to get rid of calls to setEndianness and setPointerSize and ignore the bits in the module header for them. Instead it needs to setDataLayout. 3. The parsing routines in Module.cpp for this stuff are not very clean nor optimal. In particular, setEndianness appends to the end of what it assumes is an empty string. THis isn't true after bcreader calls setDataLayout. 4. The Interpreter.cpp needs to compute the data layout for the machine its running on and call setDataLayout instead of setEndianness and setPointerSize 5. CloneModule needs to copy dataLayout, not endianness/pointer size. 6. LinkModules needs to deal with this correctly too. 7. There may be other stuff too.
One more thing: The setDataLayout function should be changed like this: - void setDataLayout(std::string DL) { DataLayout = DL; } + void setDataLayout(const std::string& DL) { DataLayout = DL; }
Created an attachment (id=585) [details] Implementation of DataLayout in bytecode
Created an attachment (id=586) [details] Patch for llvm-gcc4
Created an attachment (id=587) [details] Patch for testsuite
I've attached the necessary patches for this bug. These can't go in, however, until PR645 is resolved so that support for upgrading can go into the new llvm-upgrade.
This has now been implemented. The patches for the code have been applied. The patch for llvm-gcc4 now needs to be committed. The patch for the testsuite was partially applied. In some cases it was preferable to leave the old endian/pointersize specifications in and let llvm-upgrade upgrade them.
Owen, You're still on the hook for the bytecode format changes in the documentation. Reid.
This is done. http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070122/043374.html