Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::basic_string wrong capacity calculation when sizeof(CharT)=3, 5, or >8 #51158

Open
georgthegreat opened this issue Sep 10, 2021 · 6 comments
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Comments

@georgthegreat
Copy link
Contributor

georgthegreat commented Sep 10, 2021

Bugzilla Link 51816
Version unspecified
OS Windows NT
Attachments Test using gtest
CC @mclow

Extended Description

An attempt to compile std::basic_string with various weird character types would lead to a multiple issues.

So far I have found the following:

  1. stack allocation size would grow due to overallocation, i. e:
if constexpr (sizeof(Char) == 12) {
    EXPECT_EQ(sizeof(s), 40u);
}
  1. allocations would become power-of-two-exponential, i. e:
for (size_t i = 0; i < 47u; ++i) {
    s.push_back({});
}

if constexpr (sizeof(Char) <= 4) {
	// OK, do not request power-of-two blocks from allocator 
	EXPECT_EQ(
		s.capacity(),
		// capacity grows exponentially starting from 22 bytes
		47u
	);
} else if constexpr (sizeof(Char) == 8) {
	// Bad 
	EXPECT_EQ(
		s.capacity(),
		// capacity grows exponentially as the power-of-two
		63u
	);
}
  1. Memory leak is detected by ASan when both -fsized-deallocation is enabled and stable ABI is used (i. e. _LIBCPP_ABI_VERSION == 2):
==1260518==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x6080000004a0 in thread T0:
  object passed to delete has wrong type:
  size of the allocated type:   84 bytes;
  size of the deallocated type: 72 bytes.
    #&#8203;0 0x4cdc02 in void std::__y1::__libcpp_operator_delete<void*, unsigned long>(void*, unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/new:245:3
    #&#8203;1 0x4cdc02 in void std::__y1::__do_deallocate_handle_size<>(void*, unsigned long) include/new:271:10
    #&#8203;2 0x4cdc02 in std::__y1::__libcpp_deallocate(void*, unsigned long, unsigned long) include/new:285:14
    #&#8203;3 0x4cdc02 in std::__y1::allocator<WeirdChar12>::deallocate(WeirdChar12*, unsigned long) include/memory:850:13
    #&#8203;4 0x4cdc02 in std::__y1::allocator_traits<Allocator<WeirdChar12> >::deallocate(Allocator<WeirdChar12>&, WeirdChar12*, unsigned long) include/__memory/allocator_traits.h:484:14

This is caused by incorrect alignment in __recommend() and attempt to allocate odd number of characters. String capacity() must be even as odd bit will be overwritten by __long_mask.

Though C++ standard does not limit sizeof(CharT), I suggest static_assert(sizeof(CharT) <= 4) to be added into libc++.
Though this could break existing codebases, I consider the break fast to be the better option.

I do not think SSO is anyway helpful whenever sizeof(CharT) >= 8. Maybe we should ensure that std::basic_string is always long is such cases.

@georgthegreat
Copy link
Contributor Author

Stable ABI should be

i. e. _LIBCPP_ABI_VERSION == 1

of course

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@Quuxplusone
Copy link
Contributor

The first two points don't seem like bugs to me. It's totally normal and expected for sizeof(some_template<X>) != sizeof(some_template<Y>), even when some_template is basic_string.

The third point, "memory leak detected by ASan," seems like a bug, but I wasn't able to reproduce it from your description. Could you post a minimal compilable example? Include the complete compiler command line (e.g. clang++ -fsized-deallocation -fsanitize=address main.cpp), eliminate the gtest dependency, and make the main as short as possible. Presumably it should be just one line, since the problem happens during destruction (IIUC).

I tried to reproduce by doing this:

cat >main.cpp <<EOF
#include <string>
struct Char12 { int a, b, c; };
int main() {
    std::basic_string<Char12> s(N, {});
}
EOF
for i in `seq 1 40`; do bin/clang++ -fsized-deallocation -fsanitize=address main.cpp -DN=$i ;  ./a.out ; done

However, this test passes just fine — no errors detected at all.

#30802 is probably related.

@georgthegreat
Copy link
Contributor Author

georgthegreat commented Dec 30, 2021

@Quuxplusone, the test was attached to the original Bugzilla issue.
It's still accessible, and still reports the leak, as cited above.
It also contains some diagnostics which I believe should make the debugging easier.

Here is even more minimal reproducer:

#include <cstdint>
#include <string>

struct WeirdChar12 {
    uint32_t a;
    uint32_t b;
    uint32_t c;
};

int main() {
    using Char = WeirdChar12;
    using String = std::basic_string<Char>;
    String s;

    for (size_t i = 0; i < 47u; ++i) {
        s.push_back({});
    }
    return 0;
}

It fails as follows:

=================================================================
==1207262==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x608000000020 in thread T0:
  object passed to delete has wrong type:
  size of the allocated type:   84 bytes;
  size of the deallocated type: 72 bytes.
    #0 0x3fca02 in void std::__y1::__libcpp_operator_delete<void*, unsigned long>(void*, unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/new:245:3
    #1 0x3fca02 in void std::__y1::__do_deallocate_handle_size<>(void*, unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/new:271:10
    #2 0x3fca02 in std::__y1::__libcpp_deallocate(void*, unsigned long, unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/new:285:14
    #3 0x3fca02 in std::__y1::allocator<WeirdChar12>::deallocate(WeirdChar12*, unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/__memory/allocator.h:115:13
    #4 0x3fca02 in std::__y1::allocator_traits<std::__y1::allocator<WeirdChar12> >::deallocate(std::__y1::allocator<WeirdChar12>&, WeirdChar12*, unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/__memory/allocator_traits.h:282:13
    #5 0x3fca02 in std::__y1::basic_string<WeirdChar12, std::__y1::char_traits<WeirdChar12>, std::__y1::allocator<WeirdChar12> >::__grow_by(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/string:2358:9
    #6 0x3fc218 in std::__y1::basic_string<WeirdChar12, std::__y1::char_traits<WeirdChar12>, std::__y1::allocator<WeirdChar12> >::push_back(WeirdChar12) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/string:2698:9
    #7 0x3fc218 in main /home/thegeorg/arcadia/junk/thegeorg/string_wide_char/main.cpp:16:11
    #8 0x7f55633f00b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16
    #9 0x3fbdf8 in _start (/home/thegeorg/.ya/build/symres/3f09618e05c7203852d3a3ad9564f03b/a.out+0x3fbdf8)

0x608000000020 is located 0 bytes inside of 84-byte region [0x608000000020,0x608000000074)
allocated by thread T0 here:
    #0 0x3fc647 in void* std::__y1::__libcpp_operator_new<unsigned long>(unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/new:235:10
    #1 0x3fc647 in std::__y1::__libcpp_allocate(unsigned long, unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/new:261:10
    #2 0x3fc647 in std::__y1::allocator<WeirdChar12>::allocate(unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/__memory/allocator.h:106:38
    #3 0x3fc647 in std::__y1::allocator_traits<std::__y1::allocator<WeirdChar12> >::allocate(std::__y1::allocator<WeirdChar12>&, unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/__memory/allocator_traits.h:262:20
    #4 0x3fc647 in std::__y1::basic_string<WeirdChar12, std::__y1::char_traits<WeirdChar12>, std::__y1::allocator<WeirdChar12> >::__grow_by(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/string:2347:19
    #5 0x3fc218 in std::__y1::basic_string<WeirdChar12, std::__y1::char_traits<WeirdChar12>, std::__y1::allocator<WeirdChar12> >::push_back(WeirdChar12) /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/string:2698:9
    #6 0x3fc218 in main /home/thegeorg/arcadia/junk/thegeorg/string_wide_char/main.cpp:16:11
    #7 0x7f55633f00b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16

SUMMARY: AddressSanitizer: new-delete-type-mismatch /home/thegeorg/arcadia/contrib/libs/cxxsupp/libcxx/include/new:245:3 in void std::__y1::__libcpp_operator_delete<void*, unsigned long>(void*, unsigned long)
==1207262==HINT: if you don't care about these errors you may set ASAN_OPTIONS=new_delete_type_mismatch=0
==1207262==ABORTING

@georgthegreat
Copy link
Contributor Author

georgthegreat commented Dec 30, 2021

I believe, that in order to reproduce the bug, you have to switch between SSO-string and allocating one.
Then the capacity computation will become wrong thus making the deallocation size smaller.

@Quuxplusone
Copy link
Contributor

Ah, now I see it! -fsized-deallocation was a red herring. Confirmed. https://godbolt.org/z/4aME1zh33

cat >main.cpp <<EOF
#include <algorithm>
#include <cassert>
#include <string>
#include <vector>

std::vector<int> allocated_sizes;

template<class T>
struct Alloc {
    using value_type = T;
    T *allocate(size_t n) {
        printf("%s -- %zu\n", __PRETTY_FUNCTION__, n);
        allocated_sizes.push_back(n);
        return (T*)malloc(n * sizeof(T));
    }
    void deallocate(T *p, size_t n) {
        printf("%s -- %zu\n", __PRETTY_FUNCTION__, n);
        auto it = std::find(allocated_sizes.begin(), allocated_sizes.end(), n);
        assert(it != allocated_sizes.end());
        allocated_sizes.erase(it);
    }
};

template<int N> struct WeirdChar { char a[N]; };

int main() {
    using Char = WeirdChar<TESTN>;
    std::basic_string<Char, std::char_traits<Char>, Alloc<Char>> s;
    for (int i = 0; i < 100; ++i) {
        s.push_back({});
    }
}
EOF
for i in `seq 1 10`; do bin/clang++ -std=c++20 main.cpp -DTESTN=$i ; ./a.out ; done

Fails for TESTN equal to 3, 5, and apparently anything greater than 8, as @georgthegreat reported). Sample output:

T *Alloc<WeirdChar<3>>::allocate(size_t) [T = WeirdChar<3>] -- 17
T *Alloc<WeirdChar<3>>::allocate(size_t) [T = WeirdChar<3>] -- 35
void Alloc<WeirdChar<3>>::deallocate(T *, size_t) [T = WeirdChar<3>] -- 16
Assertion failed: (it != allocated_sizes.end()), function deallocate, file main.cpp, line 19.
Abort trap: 6

@Quuxplusone Quuxplusone changed the title std::basic_string misbehavior when sizeof(CharT) > 8 std::basic_string wrong capacity calculation when sizeof(CharT)=3, 5, or >8 Dec 30, 2021
@georgthegreat
Copy link
Contributor Author

I have tried to debug this by myself, but the code is too hard to understand without comments, so I have ended with adding static_assert(sizeof<Char> <= 4) in our copy of libc++.

So far, this worked just fine as 32 bits is enough for everyone.

@Quuxplusone Quuxplusone added the confirmed Verified by a second party label Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

No branches or pull requests

2 participants