This is an archive of the discontinued LLVM Phabricator instance.

[libc++][cuchar] Declare std::c8rtomb and std::mbrtoc8 in <cuchar> if available.
ClosedPublic

Authored by tahonermann on Aug 1 2022, 2:58 PM.

Details

Summary

This change implements the C library dependent portions of P0482R6 (char8_t: A type for UTF-8 characters and strings (Revision 6)) by declaring std::c8rtomb() and std::mbrtoc8() in the <cuchar> header when implementations are provided by the C library as specified by WG14 N2653 (char8_t: A type for UTF-8 characters and strings (Revision 1)) as adopted for C23.

A _LIBCPP_HAS_NO_C8RTOMB_MBRTOC8 macro is defined by the libc++ __config header unless it is known that the C library provides these functions in the current compilation mode. This macro is used for testing purposes and may be of use to libc++ users. At present, the only C library known to implement these functions is GNU libc as of its 2.36 release.

Diff Detail

Event Timeline

tahonermann created this revision.Aug 1 2022, 2:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2022, 2:58 PM
tahonermann requested review of this revision.Aug 1 2022, 2:58 PM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald TranscriptAug 1 2022, 2:58 PM
tahonermann planned changes to this revision.Aug 2 2022, 8:46 PM

Reverting to draft status to address CI failures.

tahonermann edited the summary of this revision. (Show Details)

Replaced the _LIBCPP_HAS_C8RTOMB_MBRTOC8 macro with a negated version, _LIBCPP_HAS_NO_C8RTOMB_MBRTOC8 to follow the existing patterns present for absent declarations of fgetpos(), fsetpos(), and fgets().

Conditioned the std namespace declarations in <cuchar> on _LIBCPP_HAS_NO_C8RTOMB_MBRTOC8. Though the declarations specify _LIBCPP_USING_IF_EXISTS, that magic only works with Clang; the declarations need to be appropriately conditioned for use/testing of libc++ with gcc.

Added test libcxx/test/std/strings/c.strings/no_c8rtomb_mbrtoc8.verify.cpp to validate that _LIBCPP_HAS_NO_C8RTOMB_MBRTOC8 is correctly set for the C library under test.

Introduced a TEST_HAS_NO_C8RTOMB_MBRTOC8 macro in libcxx/test/support/test_macros.h. Though not strictly necessary, this is consistent with the existence of _LIBCPP_HAS_NO_FGETPOS_FSETPOS.

tahonermann planned changes to this revision.Aug 3 2022, 11:03 AM

Planning changes to address CI build failures.

Corrected formatting and checking for a definition of _LIBCPP_GLIBC_PREREQ before attempting to use it.

@hubert.reinterpretcast, the AIX builds are failing the libcxx/test/std/strings/c.strings/no_c8rtomb_mbrtoc8.verify.cpp test with the following:

error: 'error' diagnostics seen but not expected:
  File /usr/include/uchar.h Line 38: cannot combine with previous 'type-name' declaration specifier
  File /usr/include/uchar.h Line 39: cannot combine with previous 'type-name' declaration specifier
2 errors generated.

The existing libcxx/test/std/strings/c.strings/cuchar.compile.pass.cpp test is marked XFAIL for AIX, presumably because it fails with the same errors.
I'll be happy to try to fix the problem (assuming it isn't a problem with the system header itself) if I can either get access to an AIX environment or get a copy of /usr/include/uchar.h to look at.

Marked test libcxx/test/std/strings/c.strings/no_c8rtomb_mbrtoc8.verify.cpp as XFAIL on AIX.

@hubert.reinterpretcast, the AIX builds are failing the libcxx/test/std/strings/c.strings/no_c8rtomb_mbrtoc8.verify.cpp test with the following:

error: 'error' diagnostics seen but not expected:
  File /usr/include/uchar.h Line 38: cannot combine with previous 'type-name' declaration specifier
  File /usr/include/uchar.h Line 39: cannot combine with previous 'type-name' declaration specifier
2 errors generated.

The existing libcxx/test/std/strings/c.strings/cuchar.compile.pass.cpp test is marked XFAIL for AIX, presumably because it fails with the same errors.
I'll be happy to try to fix the problem (assuming it isn't a problem with the system header itself) if I can either get access to an AIX environment or get a copy of /usr/include/uchar.h to look at.

The issue is that uchar.h is not C++-friendly on the platform. The error is because it tries to define char16_t and char32_t as typedefs.

The issue is that uchar.h is not C++-friendly on the platform. The error is because it tries to define char16_t and char32_t as typedefs.

Ah, thanks, @hubert.reinterpretcast! I marked the test as XFAIL.
CI builds now pass; this is ready for review.

Ah, thanks, @hubert.reinterpretcast! I marked the test as XFAIL.

Thank you, Tom; sounds good.

@ldionne, could you or a suitable delegate review this when time permits? (I understand if you are busy finalizing the LLVM 15 release).

Any libc++ members available to review? @ldionne or @Mordante?

Thanks for working on this! Here are some comments, but this looks mostly good already.

libcxx/include/__config
1203

Does this complete the implementation of P0482R6? If yes, please update the status page and feature test macros. Otherwise, what is still missing?

libcxx/include/cuchar
28–29
63–66

I think this should also be guarded with _LIBCPP_STD_VER >= 20.

libcxx/include/uchar.h
14

As a C++ implementation we are interested in which C++ version something got added. So please use that version instead of in which C version something got added. Same for the rest of the synopsis.

Thank you for the review @philnik! I'll post an update addressing some of the comments shortly. Responses included for other comments.

libcxx/include/__config
1203

I expect that this does complete P0482R6, but I'll double check, report back, and update the status page accordingly.

With regard to updating feature test macros, I don't think there is anything to be done. P0482R6 only specifies the single __cpp_lib_char8_t feature test macro for the library. libc++ already defines that macro based on the prior P0482R6 implementation efforts and the recent implementation of P1423 bumped its value. Though P0482R6 is not explicit about this (either in prose or wording), it can be inferred that the feature test macro is not applicable to these declarations as it is not required to be defined by the <cuchar> or uchar.h headers. Additionally, since c8rtomb() and mbrtoc8() are expected to be provided by the C library, entangling the macro definition with the availability of those declarations would get messy. This is consistent with the behavior of libstdcxx.

libcxx/include/cuchar
63–66

That is an option, but leads to scenarios where the declarations are available in the global namespace via uchar.h but not in the std namespace via <cuchar>. I thought it made more sense to keep these in sync.

At present, libc++ omits the other char8_t related declarations (e.g., u8string) in C++20 mode when char8_t support is disabled (of course) but likewise omits them in pre-C++20 modes when char8_t support is enabled. What I would prefer to do (in a separate patch) is to match the libstdcxx behavior of declaring all of the char8_t related declarations based on char8_t enablement rather than on the C++ standard version. This would give users more flexibility for evaluating and addressing`char8_t` related migration concerns before migrating to C++20.

libcxx/include/uchar.h
14

Things get kind of messy here. The declarations in <cuchar> were added in C++20. Since C++20 depends on C17, no version of C++ will require the uchar.h declarations until a future C++ standard that is based on C23 (which will be required by ISO to happen for C++26). Our choices are:

  1. Omit these declarations pending a future C++ standard.
  2. Annotate these declarations as "since C++20" since that will reflect existing practice (e.g., glibc will declare them in uchar.h in C++20 mode) and the declarations are needed to satisfy the std namespace declarations in <cuchar>.
  3. Optimistically annotate these as "since C++26" or "expected in C++26".

I'm inclined toward option 2.

tahonermann added inline comments.Aug 21 2022, 3:09 PM
libcxx/include/__config
1203

I audited the P0482R6 wording and found evidence of declarations for everything except the polymorphic allocator related declarations:

  • std::pmr::u8string
  • std::hash<std::pmr::u8string>

This appears to be expected given that P0220R1 is still listed as in-progress in libcxx/docs/Status/Cxx17Papers.csv and that https://reviews.llvm.org/D89057 is still awaiting revisions and approvals.

I think the current state plus this patch suffices to consider P0482R6 complete. I'll update the status page to mark it complete with a note that the missing declarations are pending completion of polymorphic allocator support.

philnik added inline comments.Aug 21 2022, 5:25 PM
libcxx/include/__config
1203

Then we can't consider it complete yet. It would be great though if you could mark it as Partial and add a note that only the std::pmr::u8string alias and std::hash<std::pmr::u8string> are missing.

libcxx/include/cuchar
63–66

Hmm. Having the declarations in uchar.h but not in cuchar would indeed be quite weird. OTOH having the declarations pre-C++20 would be an extension in the C library, which we might not want to support. I also suspect that you will find quite strong opposition to enabling char8_t pre-C++20. We are quite anti-extensions and have even removed a number of them over the last years. @ldionne Do you have any thoughts here?

libcxx/include/uchar.h
14

Yes, options 2 sounds like the right thing.

tahonermann added inline comments.Aug 22 2022, 1:46 PM
libcxx/include/__config
1203

That works; will do.

Addressed code review comments.

tahonermann marked 3 inline comments as done.Aug 22 2022, 3:30 PM

This is again ready for review pending successful CI builds and a decision on whether the std namespace declarations should be protected by _LIBCPP_STD_VER >= 20.

tahonermann added inline comments.Aug 28 2022, 8:56 PM
libcxx/include/cuchar
63–66

@ldionne, could you please share your thoughts on whether the std::c8rtomb() and std::mbrtoc8() declarations should be restricted to C++20 and later or provided consistent with the corresponding C declarations in the global namespace?

ldionne added inline comments.Sep 2 2022, 7:52 AM
libcxx/include/cuchar
63–66

My preference would be to be strict (and hence not include them before C++20). That being said, it seems like we have a strong precedent everywhere in our C compatibility headers to include functions inside std:: whenever they are provided by the underlying C library. So I think the current approach is OK.

Is there a reason why we even need _LIBCPP_HAS_NO_C8RTOMB_MBRTOC8 at all? We have _LIBCPP_USING_IF_EXIST to eliminate the need for exactly this sort of stuff. Of course GCC doesn't implement it, however it would still work when compiling using Clang on a platform that uses glibc.

tahonermann added inline comments.Sep 2 2022, 4:06 PM
libcxx/include/cuchar
63–66

My first patch relied solely on _LIBCPP_USING_IF_EXIST, but that lead to CI build failures as can be seen at https://reviews.llvm.org/harbormaster/unit/view/4733761/. It seems that all such unconditional uses of _LIBCPP_USING_IF_EXIST are only exercised in CI builds where the dependent declarations are present. I followed the precedent that appears to be in place for other such problematic cases; see _LIBCPP_HAS_NO_FGETPOS_FSETPOS for example.

ldionne added inline comments.Sep 8 2022, 1:25 PM
libcxx/include/cuchar
63–66

_LIBCPP_HAS_NO_FGETPOS_FSETPOS predates the existence of _LIBCPP_USING_IF_EXIST. https://reviews.llvm.org/harbormaster/unit/view/4733761/ is a failure on GCC, which I would indeed expect because GCC does not implement the using_if_exists attribute.

I guess my question is: would you see an issue with supporting <cuchar> only on Clang, or on GCC when a suitable Glibc is present? Glibc 2.36 is pretty recent, so that would mean only on Clang at least for some time. IMO this is acceptable since libc++'s primary compiler is Clang.

tahonermann added inline comments.Sep 8 2022, 6:08 PM
libcxx/include/cuchar
63–66

I don't have a sense of how many people use libc++ with gcc. Since libc++'s <cuchar> does work with gcc today, removing support for it could legitimately be considered a regression.

A solution that limits <cuchar> support depending on the presence of a suitable Glibc release would, I think, look quite similar to what is implemented in this patch, so I don't see that as being an improvement. I looked for existing cases where a libc++ header is explicitly not supported for use with gcc and didn't find one. Are there any such existing cases?

ldionne accepted this revision.Sep 9 2022, 6:29 AM

Let's go with this.

I'd normally fight against the _LIBCPP_HAS_NO_C8RTOMB_MBRTOC8, but in this case that will preclude libc++ from being usable on GCC with essentially all version of glibc, since c8rtomb was added to it less than a month ago.

But just for the record, our normal policy is that compilers that don't support using_if_exists are only supported on platforms that implement a full C standard library, which would mean (if taken at the letter) that GCC isn't supported on Glibc < 2.36. In practice, let's make this work, but we're not going to start adding carve-outs for ancient C libraries (that's what I want to avoid by giving this rationale).

This revision is now accepted and ready to land.Sep 9 2022, 6:29 AM

Let's go with this.

Thank you! I'll proceed with landing the changes as-is.

I'd normally fight against the _LIBCPP_HAS_NO_C8RTOMB_MBRTOC8, but in this case that will preclude libc++ from being usable on GCC with essentially all version of glibc, since c8rtomb was added to it less than a month ago.

Understood. Note that this same issue will arise if/when Microsoft adds support for these functions to their C library implementation. Unless I'm mistaken, Microsoft doesn't offer a __attribute__((__using_if_exists__)) equivalent for Visual C++.

But just for the record, our normal policy is that compilers that don't support using_if_exists are only supported on platforms that implement a full C standard library, which would mean (if taken at the letter) that GCC isn't supported on Glibc < 2.36. In practice, let's make this work, but we're not going to start adding carve-outs for ancient C libraries (that's what I want to avoid by giving this rationale).

Understood. It might be worth establishing a support policy for various C library implementations. If, for example, libc++ only promised support for glibc versions less than 4 years old, then we could remove the _LIBCPP_HAS_NO_C8RTOMB_MBRTOC8 and related conditionals in 4 years (subject to use with other C libraries).

This change was responsible for the failure of the sanitizer-ppc64be-linux5955 build. The build failed with the following error:

/home/buildbots/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm_build64/include/c++/v1/__config:1222:67: error: function-like macro '__GLIBC_USE' is not defined
#    if _LIBCPP_GLIBC_PREREQ(2, 36) && (defined(__cpp_char8_t) || __GLIBC_USE(ISOC2X))
                                                                  ^

It seems that this CI configuration is run with a relatively ancient glibc version earlier than 2.25 (2.25 was released on 2017-02-01 and is the version that introduced the __GLIBC_USE macro). For the short term, I'm going to commit the following trivial change to allow the build to pass.

diff --git a/libcxx/include/__config b/libcxx/include/__config
index a6f3b4e88aa1..681c10306420 100644
--- a/libcxx/include/__config
+++ b/libcxx/include/__config
@@ -1218,7 +1218,7 @@ _LIBCPP_BEGIN_NAMESPACE_STD _LIBCPP_END_NAMESPACE_STD
 // determining the latter depends on internal GNU libc details. If the
 // __cpp_char8_t feature test macro is not defined, then a char8_t typedef
 // will be declared as well.
-#  if defined(_LIBCPP_GLIBC_PREREQ)
+#  if defined(_LIBCPP_GLIBC_PREREQ) && defined(__GLIBC_USE)
 #    if _LIBCPP_GLIBC_PREREQ(2, 36) && (defined(__cpp_char8_t) || __GLIBC_USE(ISOC2X))
 #      undef _LIBCPP_HAS_NO_C8RTOMB_MBRTOC8
 #    endif

However, investigating this lead me to realize that the check for __GLIBC_USE(ISOC2X) will not work long term. Eventually, ISOC2X will be renamed to ISOC23 and this check will stop working. We have a couple of options available:

  1. We can remove the __GLIBC_USE(ISOC2X)) check. This check is only relevant when builtin support for char8_t is disabled (when __cpp_char8_t is not defined). If this check is removed, then we'll have the situation that c8rtomb() and mbrtoc8() will be declared in the global namespace but not in the std namespace when targeting c++17 or earlier without -fchar8_t (or equivalent) or when targeting C++20 or later with -fno-char8_t (or equivalent).
  2. We can assume that ISOC2X will be renamed to ISOC23 and check for __GLIBC_USE(ISOC2X) || __GLIBC_USE(ISOC23).

I'm inclined toward option 1 for two reasons: 1) use of glibc internal details like this is inherently fragile, and 2) There is no need for the C++ library to expose the C23 char8_t related declarations when builtin char8_t support is disabled.

I'm inclined toward option 1 for two reasons: 1) use of glibc internal details like this is inherently fragile, and 2) There is no need for the C++ library to expose the C23 char8_t related declarations when builtin char8_t support is disabled.

A review that implements option 1 has been submitted as https://reviews.llvm.org/D142588.