LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 37398 - Incorrect std::type_info comparison and hash_code for pointer to incomplete type
Summary: Incorrect std::type_info comparison and hash_code for pointer to incomplete type
Status: REOPENED
Alias: None
Product: libc++
Classification: Unclassified
Component: All Bugs (show other bugs)
Version: unspecified
Hardware: All All
: P normal
Assignee: Eric Fiselier
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-09 12:56 PDT by Volodymyr Sapsai
Modified: 2019-10-02 17:07 PDT (History)
6 users (show)

See Also:
Fixed By Commit(s):


Attachments
Repro that gets std::type_info in different translation units and compares them. (1004 bytes, application/zip)
2018-05-09 12:56 PDT, Volodymyr Sapsai
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Volodymyr Sapsai 2018-05-09 12:56:31 PDT
Created attachment 20275 [details]
Repro that gets std::type_info in different translation units and compares them.

Steps to reproduce:
1. With typeid get std::type_info for a pointer to incomplete type in one translation unit.
2. Get std::type_info for the same type in different translation unit.
3. Compare obtained type_info.
4. Compare hashes of obtained type_info.

Small repro case is attached.

Actual result:
Types are considered to be different, hash codes are different. std::type_info::name() are equal by value, char* pointers are different.

Expected result:
Types should be equal, hash codes should be equal.

Comments:
In this case clang emits type info with internal linkage [1] which is correct according to my understanding of Itanium C++ ABI. Also ABI documentation [2] tells

> After linking and loading, only one std::type_info structure is accessible
> via the external name defined by this ABI for any particular complete type
> symbol (see Vague Linkage). Therefore, except for direct or indirect pointers
> to incomplete types, the equality and inequality operators can be written as
> address comparisons when operating on those type_info objects: two type_info
> structures describe the same type if and only if they are the same structure
> (at the same address). However, in the case of pointer types, directly or
> indirectly pointing to incomplete class types, a more complex comparison is
> required, described below with the RTTI layout of pointer types.

So to me it looks like libc++ bug.

[1] https://github.com/llvm-mirror/clang/blob/6a37651bb30339e60ea617d510b4703bcd2e89e8/lib/CodeGen/ItaniumCXXABI.cpp#L3013-L3023
[2] http://refspecs.linuxbase.org/cxxabi-1.83.html#rtti
Comment 1 Eric Fiselier 2018-05-09 16:29:34 PDT
OK, there are a couple of potential solutions here.

(1) Always use strcmp for ordering, and hash the string contents (not it's address).

(2) Only use strcmp/string hashing when the type-info refers to an incomplete pointer type. This info is available in the private type-info class __pbase_type_info. 

(3) Change clang to emit the type info string as linkonce_odr when the type-info refers to a pointer-to-incomplete type.

(1) will be a lot more expensive in most cases, (2) will be trickier to implement, and will likely require introducing a new symbol into the libc++ dylib, so it's ABI breaking.

I'm proposing we go with option (3). @John McCall, could you offer your opinion on this solution? I've put the changes up for review as https://reviews.llvm.org/D46665.
Comment 2 Richard Smith 2018-05-09 16:56:29 PDT
I believe (2) requires that we use string hashing for all pointer types that could possibly point to an incomplete type, not only for ones that actually do (we need the pointer-to-incomplete and pointer-to-complete versions for the same type to hash the same). Probably the easiest way to do that would be to treat all pointer type_info as non-unique. :(
Comment 3 John McCall 2018-05-09 18:27:12 PDT
Surely all pointer type_infos are non-unique anyway?  Non-unique type_infos are just for the fundamental types and polymorphic class types.
Comment 4 John McCall 2018-05-09 18:29:16 PDT
(In reply to John McCall from comment #3)
> Surely all pointer type_infos are non-unique anyway?  Non-unique type_infos
> are just for the fundamental types and polymorphic class types.

Of course I mean *unique* type_infos are just for fundamental types and polymorphic class types.
Comment 5 Eric Fiselier 2018-05-09 18:48:09 PDT
@John, Why are uniqued type-info's only for those types, and not pointer types or member pointer types?
Comment 6 John McCall 2018-05-09 18:54:03 PDT
Because we're not going to eagerly emit a bunch of composite type_infos every time we emit a type?  The ABI says that we emit a type_info eagerly for polymorphic class types when we emit their v-tables because the typeid operator demands that we can go from an instance to the appropriate type_info, and frankly most C++ programmers hate that.  We're not going to emit the type_info for the class *and* a type_info for a pointer to the class.
Comment 7 Eric Fiselier 2018-05-09 19:01:03 PDT
> Because we're not going to eagerly emit a bunch of composite type_infos every time we emit a type? 

This isn't a problem about eager emission. In these cases the user is forcing the emission of the type-info, for example by using typeid.

Here's an example reproducer:
// header.h
struct T;
extern std::type_info const& Info;

// tu_one.cpp
#include "header.h"
std::type_info const& Info = typeid(T*);

// tu_two.cpp
#include "header.h"
struct T {};
int main() {
  auto &TI1 = Info;
  auto &TI2 = typeid(T*);
  assert(TI1 == TI2); // Fails
  assert(TI1.hash_code() == TI2.hash_code()); // Fails
}
Comment 8 John McCall 2018-05-09 19:23:29 PDT
Unique type info emission relies on there being a unique place that emits the type info.  Any translation unit can perform a typeid of a public type, so typeid emission locations are not unique.
Comment 9 Eric Fiselier 2018-05-09 19:32:15 PDT
Perhaps we're talking past each other?

By "unique" i mean what [Itanium]p2.9.3 says when it states:
> After linking and loading, only one std::type_info structure is accessible via the external name defined by this ABI for any particular complete type symbol (see Vague Linkage). 

[1] https://itanium-cxx-abi.github.io/cxx-abi/abi.html#typeid
Comment 10 Eric Fiselier 2018-05-09 19:39:57 PDT
Ignore that last comment. It was misleading since the quoted section only places requirements on completed types. (Though it still requires their type info be deduplicated)

What I meant to cite was right below it:

> The name() member function returns the address of an NTBS,
> unique to the type

And

> In a flat address space (such as that of the Itanium architecture), 
> the operator==, operator!=, and before() members are easily implemented
> in terms of an address comparison of the name NTBS.
Comment 11 John McCall 2018-05-09 20:06:22 PDT
Sorry, I was thinking of various non-standard ABIs (including GCC's current ABI, IIRC, but also Darwin ARM64) which have concepts of non-unique RTTI.  That's my fault; I should have taken a second to remind myself what the Itanium ABI actually says.

The Itanium type_info algorithm assumes that the RTTI name is globally uniqued.  For a pointer to a non-internal incomplete type, Clang should emitting the type_info symbol (_ZTI*) with internal linkage and the type name symbol (_ZTS*) with weak external linkage (in LLVM terms, linkonce_odr).  It looks like Clang is emitting the type name symbol with internal linkage, which is incorrect.
Comment 12 Eric Fiselier 2018-05-09 20:08:47 PDT
Awesome, glad you agree!

I've put up a Clang patch that does just that!
https://reviews.llvm.org/D46665
Comment 13 Eric Fiselier 2018-05-10 13:44:32 PDT
Fixed in r332028 (Clang 7.0).
Comment 14 Eric Fiselier 2018-05-30 10:56:29 PDT
Reopening. The Clang changes have been reverted.