LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 17122 - clang-format breaks code by splitting a string in a macro call.
Summary: clang-format breaks code by splitting a string in a macro call.
Status: RESOLVED FIXED
Alias: None
Product: clang
Classification: Unclassified
Component: Formatter (show other bugs)
Version: trunk
Hardware: PC Windows NT
: P normal
Assignee: Alexander Kornienko
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-06 10:55 PDT by Scott Determan
Modified: 2013-09-17 04:03 PDT (History)
6 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Scott Determan 2013-09-06 10:55:42 PDT
Assume you have a macro defined as follows:
_T(x) L ## x

Where L"A string" makes wchar_t literal

Now when you have code like this:
_T("Some really long string that needs to be split into multiple lines")

clang-format will output:

_T("Some really long string that needs"
   " to be split into multiple lines")

On visual studio 2010, this appears to be expanded as:
L"Some really long string that needs" " to be split into multiple lines"

and I get an error:
error C2308: concatenating mismatched strings
Comment 1 Richard Smith 2013-09-06 13:45:02 PDT
This is an MSVC bug. Per [lex.string]p13, "If one string literal has no encoding-prefix, it is treated as a string literal of the same encoding-prefix as the other operand."

It's not clear to me how practical it would be for us to work around this bug. =(
Comment 2 Chandler Carruth 2013-09-08 23:51:38 PDT
This is a really common Windows idiom, so I think we should try to support it. It doesn't seem too hard.

Wrap:
  _T("really long string")

To:
  _T("really")
  _T("long")
  _T("string")


And we should recognize it by finding a string literal inside of a '_T(...)' like thing.
Comment 3 Daniel Jasper 2013-09-09 00:22:56 PDT
What exactly is a "_T(...)" like thing? If we are looking for exactly for the identifier _T, this could work. Otherwise, I think we would have to be absolutely sure that it is a macro defined as "L##x", because basically everything else will break with this change.

Seems like a big risk to work around a compiler bug. Alternatively, we could detect this pattern and not insert a break.
Comment 4 Chandler Carruth 2013-09-09 00:47:55 PDT
(In reply to comment #3)
> What exactly is a "_T(...)" like thing? If we are looking for exactly for
> the identifier _T, this could work. Otherwise, I think we would have to be
> absolutely sure that it is a macro defined as "L##x", because basically
> everything else will break with this change.

Yea, I was thinking to look for exactly '_T' preceding a parenthesis.

> Seems like a big risk to work around a compiler bug. Alternatively, we could
> detect this pattern and not insert a break.

My impression is that this is not dissimilar to the _(...) trick used in gettext in that it shows up in lots of places. That's why I thought it might be reasonable and worthwhile to do.... Fixing the compiler bug is both unlikely and slow.
Comment 5 Daniel Jasper 2013-09-09 01:58:28 PDT
As I said, as long as we are looking for exactly "_T", I don't see problems. If not, I would prefer not introducing a line break (which would also "solve" this bug).
Comment 6 Alexander Kornienko 2013-09-12 08:26:27 PDT
(In reply to comment #1)
> This is an MSVC bug. Per [lex.string]p13, "If one string literal has no
> encoding-prefix, it is treated as a string literal of the same
> encoding-prefix as the other operand."
> 
> It's not clear to me how practical it would be for us to work around this
> bug. =(

This behavior conforms with the C++03 standard ([lex.string]p3):
"In translation phase 6 (2.1), adjacent narrow string literals are concatenated and adjacent wide string literals are concatenated. If a narrow string literal token is adjacent to a wide string literal token, the behavior is undefined."
Comment 7 Daniel Jasper 2013-09-17 04:03:30 PDT
AFAICT this is fixed by Alex's r190804.