This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Improve std::to_chars for base != 10.
ClosedPublic

Authored by Mordante on Mar 1 2021, 10:13 AM.

Details

Summary

This improves the speed of to_chars for bases 2, 8, and 16.
These bases are common and used in <format>. This change
uses a lookup table, like done in base 10 and causes an increase
in code size. The change has a small overhead for the other bases.

Benchmark                             Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------
BM_to_chars_good/2                 -0.9476         -0.9476           252            13           252            13
BM_to_chars_good/3                 +0.0018         +0.0018           145           145           145           145
BM_to_chars_good/4                 +0.0108         +0.0108           104           105           104           105
BM_to_chars_good/5                 +0.0159         +0.0160            89            91            89            91
BM_to_chars_good/6                 +0.0162         +0.0162            80            81            80            81
BM_to_chars_good/7                 +0.0117         +0.0117            72            73            72            73
BM_to_chars_good/8                 -0.8643         -0.8643            64             9            64             9
BM_to_chars_good/9                 +0.0095         +0.0095            60            60            60            60
BM_to_chars_good/10                +0.0540         +0.0540             6             6             6             6
BM_to_chars_good/11                +0.0299         +0.0299            55            57            55            57
BM_to_chars_good/12                +0.0060         +0.0060            48            49            49            49
BM_to_chars_good/13                +0.0102         +0.0102            48            48            48            48
BM_to_chars_good/14                +0.0184         +0.0185            47            48            47            48
BM_to_chars_good/15                +0.0269         +0.0269            44            45            44            45
BM_to_chars_good/16                -0.8207         -0.8207            37             7            37             7
BM_to_chars_good/17                +0.0241         +0.0241            37            38            37            38
BM_to_chars_good/18                +0.0221         +0.0221            37            38            37            38
BM_to_chars_good/19                +0.0222         +0.0223            37            38            37            38
BM_to_chars_good/20                +0.0317         +0.0317            38            39            38            39
BM_to_chars_good/21                +0.0342         +0.0341            38            39            38            39
BM_to_chars_good/22                +0.0336         +0.0336            36            38            36            38
BM_to_chars_good/23                +0.0222         +0.0222            34            35            34            35
BM_to_chars_good/24                +0.0185         +0.0185            31            32            31            32
BM_to_chars_good/25                +0.0157         +0.0157            32            32            32            32
BM_to_chars_good/26                +0.0181         +0.0181            32            32            32            32
BM_to_chars_good/27                +0.0153         +0.0153            32            32            32            32
BM_to_chars_good/28                +0.0179         +0.0179            32            32            32            32
BM_to_chars_good/29                +0.0189         +0.0189            32            33            32            33
BM_to_chars_good/30                +0.0212         +0.0212            32            33            32            33
BM_to_chars_good/31                +0.0221         +0.0221            32            33            32            33
BM_to_chars_good/32                +0.0292         +0.0292            32            33            32            33
BM_to_chars_good/33                +0.0319         +0.0319            32            33            32            33
BM_to_chars_good/34                +0.0411         +0.0410            33            34            33            34
BM_to_chars_good/35                +0.0515         +0.0515            33            34            33            34
BM_to_chars_good/36                +0.0502         +0.0502            32            34            32            34
BM_to_chars_bad/2                  -0.8752         -0.8752            40             5            40             5
BM_to_chars_bad/3                  +0.1952         +0.1952            21            26            21            26
BM_to_chars_bad/4                  +0.3626         +0.3626            16            22            16            22
BM_to_chars_bad/5                  +0.2267         +0.2268            17            21            17            21
BM_to_chars_bad/6                  +0.3560         +0.3559            14            19            14            19
BM_to_chars_bad/7                  +0.4599         +0.4600            12            18            12            18
BM_to_chars_bad/8                  -0.5074         -0.5074            11             5            11             5
BM_to_chars_bad/9                  +0.4814         +0.4814            10            15            10            15
BM_to_chars_bad/10                 +0.7761         +0.7761             2             4             2             4
BM_to_chars_bad/11                 +0.3948         +0.3948            12            16            12            16
BM_to_chars_bad/12                 +0.3203         +0.3203            10            13            10            13
BM_to_chars_bad/13                 +0.3067         +0.3067            11            14            11            14
BM_to_chars_bad/14                 +0.2235         +0.2235            12            14            12            14
BM_to_chars_bad/15                 +0.2675         +0.2675            11            14            11            14
BM_to_chars_bad/16                 -0.1801         -0.1801             7             5             7             5
BM_to_chars_bad/17                 +0.5651         +0.5651             7            11             7            11
BM_to_chars_bad/18                 +0.5407         +0.5406             7            11             7            11
BM_to_chars_bad/19                 +0.5593         +0.5593             8            12             8            12
BM_to_chars_bad/20                 +0.5823         +0.5823             8            13             8            13
BM_to_chars_bad/21                 +0.6032         +0.6032             9            15             9            15
BM_to_chars_bad/22                 +0.6407         +0.6408             9            14             9            14
BM_to_chars_bad/23                 +0.6292         +0.6292             7            12             7            12
BM_to_chars_bad/24                 +0.5784         +0.5784             6            10             6            10
BM_to_chars_bad/25                 +0.5784         +0.5784             6            10             6            10
BM_to_chars_bad/26                 +0.5713         +0.5713             7            10             7            10
BM_to_chars_bad/27                 +0.5969         +0.5969             7            11             7            11
BM_to_chars_bad/28                 +0.6131         +0.6131             7            11             7            11
BM_to_chars_bad/29                 +0.6937         +0.6937             7            11             7            11
BM_to_chars_bad/30                 +0.7655         +0.7656             7            12             7            12
BM_to_chars_bad/31                 +0.8939         +0.8939             6            12             6            12
BM_to_chars_bad/32                 +1.0157         +1.0157             6            13             6            13
BM_to_chars_bad/33                 +1.0279         +1.0279             7            14             7            14
BM_to_chars_bad/34                 +1.0388         +1.0388             7            14             7            14
BM_to_chars_bad/35                 +1.0990         +1.0990             7            15             7            15
BM_to_chars_bad/36                 +1.1503         +1.1503             7            15             7            15

Diff Detail

Event Timeline

Mordante requested review of this revision.Mar 1 2021, 10:13 AM
Mordante created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2021, 10:13 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Mordante planned changes to this revision.Mar 1 2021, 10:14 AM

I think it would make sense to have a general implementation that works for all bases, and then have a few common bases (likely 2, 8, 10, 16 as you say) be compiled in the library. I think the top-level could look something like:

if (base == 2) // then dispatch to the implementation for base 2 in the library
else if (base == 8) // dispatch to base 8
else if (base == 10) // dispatch to base 10
else if (base == 16) // dispatch to base 16
else // general implementation

I'm not saying use if over switch or whatever -- I'm just saying IMO it would make sense to check for the base, optimize for a few common bases (for a modest increase in code size in the dylib), and then fallback to a general implementation available as templates that will be compiled in the user's binaries. WDYT?

So if that wasn't clear, I think I'm saying I agree with you that (3) seems like what we want to do.

So if that wasn't clear, I think I'm saying I agree with you that (3) seems like what we want to do.

Thanks for your feedback! Then I'll proceed with method 3. I'll make a separate review to fix the bug and then use this review to do the optimization for bases 2, 8, and 16.

Mordante updated this revision to Diff 346512.May 19 2021, 10:55 AM

A new version testing some different optimizations options.
Test to see whether it passes a CI run.

Mordante planned changes to this revision.May 19 2021, 10:56 AM
Mordante updated this revision to Diff 346769.May 20 2021, 9:42 AM

Remove tabs. The CI failed on this the last run.

Mordante planned changes to this revision.May 20 2021, 10:52 AM
Mordante updated this revision to Diff 402002.Jan 21 2022, 8:38 AM

Rebased and using the fastest solution.

Mordante edited the summary of this revision. (Show Details)Jan 21 2022, 8:50 AM
Mordante added a reviewer: Quuxplusone.

I think it would make sense to have a general implementation that works for all bases, and then have a few common bases (likely 2, 8, 10, 16 as you say) be compiled in the library. I think the top-level could look something like:

I think moving code to the library will only be temporary. It seems the direction is to make these functions constexpr in C++23. https://wg21.link/p2291r3

libcxx/include/charconv
548–557

Run 1 -> #if 0
Run 2 -> #if 1

Still some scattered #if 0/1s left to remove.

libcxx/include/charconv
329

The red flag here was the : unsigned (I was like, "why does the signedness matter? better study closer") but actually, why do we need an enumerator for this at all? It's used only twice, and in contexts like

      _VSTD::memcpy(__p, &__base_2_lut[4 * __c], 4);
~~~
      unsigned __c = __value % _Base;
      __value /= _Base;
      *--__p = "01"[__c];

The number 2 is hard-coded so many places here, two more won't make a difference (and will save two lines and eliminate a red flag). Ditto in the 8 and 16 cases.

(If this was to appease clang-tidy: Don't appease it. :))

334

I'd remove noexcept from line 331, as it's just noise. (Nobody relies on the noexceptness of this function.) Ditto throughout — I assume there will be other redundant noexcepts.

358

Is Clang smart enough to optimize this into '0' + __c? Would the latter be more readable anyway? (Note there is no issue with EBCDIC here; all charsets must have '0'+1 == '1' according to the standard.)
Ditto line 413.

363

Either constexpr if this is C++11-and-later, or _LIBCPP_CONSTEXPR const, surely?

490

__enable_if_t<sizeof(_Tp) >= sizeof(unsigned)>* = nullptr is the more traditional libc++ism. (enable_if_t if this is C++14-and-later.)

Thanks for the review! The patch is still RFC to see whether this is the direction we want to go. Hence the remaining #if and the lack of polish.

Thanks for the review! The patch is still RFC to see whether this is the direction we want to go. Hence the remaining #if and the lack of polish.

Just to make sure I understand (the review summary is difficult to read for me due to formatting issues), we have a base implementation using __to_chars_itoa, and then we also have implementations specific to other bases that use __to_chars_integral<2> and friends. When those base-specific implementations are enabled, we get roughly 3x performance improvement on those bases, at the cost of < 256 bytes of size for a lookup table. That's right?

If so, I'd say make the optimization non-customizeable and always use it.

Mordante marked 5 inline comments as done.Mar 8 2022, 8:48 AM

Just to make sure I understand (the review summary is difficult to read for me due to formatting issues), we have a base implementation using __to_chars_itoa, and then we also have implementations specific to other bases that use __to_chars_integral<2> and friends. When those base-specific implementations are enabled, we get roughly 3x performance improvement on those bases, at the cost of < 256 bytes of size for a lookup table. That's right?

The size of the lookup table differs, 64-bytes for base 2, 128-bytes for base 8, and 512-bytes for base 16.
The speedup is 14 times for base 2, 3 times for base 8, and 2 times for base 16.

libcxx/include/charconv
329

No it had nothing to do with clang-tidy ;-) This was from an earlier experiment it had a generic form for other integral values. However there the size/speed benefit wasn't great so most of that was removed. Now this is also gone.

358

The entire file uses this style, including places are _Base > 10. So I prefer to keep it as is.

363

This was copy-pasted from another part of this file. D121223 addresses these places.

490

This is the same as other places in this file, so I prefer to keep it.
Note the enable_if specifies the type as int so nullptr isn't an option.

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2022, 8:48 AM
Quuxplusone added inline comments.Mar 8 2022, 9:50 AM
libcxx/include/charconv
490

This is the same as other places in this file, so I prefer to keep it.

OK.

Note the enable_if specifies the type as int so nullptr isn't an option.

FWIW: The existing enable_if comes out to int = 0, but in my suggested "more traditional libc++ism" rewrite, the __enable_if_t would come out to void* = nullptr instead. I don't see a problem there, so either I'm missing something or you were. (The rewrite would change the mangling, but this is hide-from-abi so we shouldn't care.)

Mordante marked 5 inline comments as done.Mar 12 2022, 4:29 AM
Mordante added inline comments.
libcxx/include/charconv
490

I prefer to keep them in this style to keep the file uniform. It indeed shouldn't be an ABI break, but I rather keep them as-is, just in case there is an issue.

Mordante updated this revision to Diff 414823.Mar 12 2022, 4:46 AM
Mordante marked an inline comment as done.
Mordante edited the summary of this revision. (Show Details)

Rebase and address review comments.

Mordante retitled this revision from [RFC][libc++] Improve std::to_chars for base != 10. to [libc++] Improve std::to_chars for base != 10..Mar 22 2022, 12:03 PM
Mordante edited the summary of this revision. (Show Details)
Mordante edited the summary of this revision. (Show Details)
ldionne accepted this revision.May 13 2022, 11:54 AM

This looks sensible to me.

libcxx/include/charconv
508

I don't think we need _LIBCPP_AVAILABILITY_TO_CHARS here, since this is all implemented in the headers. Or did I miss a dependency on the dylib?

This revision is now accepted and ready to land.May 13 2022, 11:54 AM
philnik added inline comments.
libcxx/include/charconv
546

Does this [[likely]] make any difference?

Mordante updated this revision to Diff 429426.May 14 2022, 2:40 AM

Rebased and address review comments.
Will land when the build passes.

Mordante marked 2 inline comments as done.May 14 2022, 3:59 AM
Mordante added inline comments.
libcxx/include/charconv
508

Good catch!

546

Yes I've measured in the benchmarks that it makes improvements. However I plan to refactor this file to prepare for https://wg21.link/p2291r3 "Add Constexpr Modifiers to Functions to_chars and from_chars for Integral Types in <charconv>" and 128-bit support. When the base 10 implementation moves to the header this attribute may have less benefit. But that requires another round of profiling.

This revision was automatically updated to reflect the committed changes.
Mordante marked 2 inline comments as done.