LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 48280 - ColumnLimit check for trailing comments alignment acts wrong for multi-byte UTF-8
Summary: ColumnLimit check for trailing comments alignment acts wrong for multi-byte U...
Status: NEW
Alias: None
Product: clang
Classification: Unclassified
Component: Formatter (show other bugs)
Version: 11.0
Hardware: PC Linux
: P normal
Assignee: Unassigned Clang Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-24 00:58 PST by Chubanov Kirill
Modified: 2020-11-24 00:58 PST (History)
3 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Chubanov Kirill 2020-11-24 00:58:40 PST
.clang-format:
AlignTrailingComments: true
ColumnLimit: 80

What we have upon clang-format:

int a_short;         // some 1-byte UTF8 comment with some good and neat info
int a_veryverylong;  // some one-byte UTF8 comment breaks correctly at 80 char
                     // boundary

int a_short_rus;  // А теперь комментарии, например, на русском, 2-байта
int a_veryverylong_rus;  // Верхний коммент еще не превысил границу в 80, однако
                         // уже отодвинут. Перенос, при этом, отрабатывает верно
-------------

What we should have:

int a_short;         // some 1-byte UTF8 comment with some good info
int a_veryverylong;  // some one-byte UTF8 comment breaks correctly at 80 char
                     // boundary

int a_short_rus;         // А теперь комментарии, например, на русском, 2-байта
int a_veryverylong_rus;  // Верхний коммент еще не превысил границу в 80, однако
                         // уже отодвинут. Перенос, при этом, отрабатывает верно
-------------
Russian trailing comments go as UTF-8 2-byte characters, and, obviously, clang-format counts their length as raw byte count when checking if line is exceeded. As a result, comments fall back closer to code, while still having enough space for being aligned.
This is relevant only for trailing comment alignment tirgger check. Line break upon exceeding 80 character limit works correctly for multi-byte characters.