LLVM  6.0.0svn
Enumerations | Functions
llvm::sys::unicode Namespace Reference

Enumerations

enum  ColumnWidthErrors { ErrorInvalidUTF8 = -2, ErrorNonPrintableCharacter = -1 }
 

Functions

bool isPrintable (int UCS)
 Determines if a character is likely to be displayed correctly on the terminal. More...
 
int columnWidthUTF8 (StringRef Text)
 Gets the number of positions the UTF8-encoded Text is likely to occupy when output on a terminal ("character width"). More...
 
static int charWidth (int UCS)
 Gets the number of positions a character is likely to occupy when output on a terminal ("character width"). More...
 

Enumeration Type Documentation

◆ ColumnWidthErrors

Enumerator
ErrorInvalidUTF8 
ErrorNonPrintableCharacter 

Definition at line 24 of file Unicode.h.

Function Documentation

◆ charWidth()

static int llvm::sys::unicode::charWidth ( int  UCS)
inlinestatic

Gets the number of positions a character is likely to occupy when output on a terminal ("character width").

This depends on the implementation of the terminal, and there's no standard definition of character width. The implementation defines it in a way that is expected to be compatible with a generic Unicode-capable terminal.

Returns
Character width:
  • ErrorNonPrintableCharacter (-1) for non-printable characters (as identified by isPrintable);
  • 0 for non-spacing and enclosing combining marks;
  • 2 for CJK characters excluding halfwidth forms;
  • 1 for all remaining characters.

Definition at line 227 of file Unicode.cpp.

References llvm::sys::UnicodeCharSet::contains(), ErrorNonPrintableCharacter, and isPrintable().

Referenced by columnWidthUTF8().

◆ columnWidthUTF8()

int llvm::sys::unicode::columnWidthUTF8 ( StringRef  Text)

Gets the number of positions the UTF8-encoded Text is likely to occupy when output on a terminal ("character width").

This depends on the implementation of the terminal, and there's no standard definition of character width.

The implementation defines it in a way that is expected to be compatible with a generic Unicode-capable terminal.

Returns
Character width:
  • ErrorNonPrintableCharacter (-1) if Text contains non-printable characters (as identified by isPrintable);
  • 0 for each non-spacing and enclosing combining mark;
  • 2 for each CJK character excluding halfwidth forms;
  • 1 for each of the remaining characters.

Definition at line 343 of file Unicode.cpp.

References charWidth(), llvm::conversionOK, llvm::ConvertUTF8toUTF32(), llvm::StringRef::data(), ErrorInvalidUTF8, ErrorNonPrintableCharacter, llvm::getNumBytesForUTF8(), llvm::StringRef::size(), and llvm::strictConversion.

Referenced by llvm::sys::locale::columnWidth().

◆ isPrintable()

bool llvm::sys::unicode::isPrintable ( int  UCS)

Determines if a character is likely to be displayed correctly on the terminal.

Exact implementation would have to depend on the specific terminal, so we define the semantic that should be suitable for generic case of a terminal capable to output Unicode characters.

All characters from the Unicode code point range are considered printable except for:

  • C0 and C1 control character ranges;
  • default ignorable code points as per 5.21 of http://www.unicode.org/versions/Unicode6.2.0/UnicodeStandard-6.2.pdf except for U+00AD SOFT HYPHEN, as it's actually displayed on most terminals;
  • format characters (category = Cf);
  • surrogates (category = Cs);
  • unassigned characters (category = Cn).
    Returns
    true if the character is considered printable.

Definition at line 23 of file Unicode.cpp.

References llvm::sys::UnicodeCharSet::contains().

Referenced by charWidth(), and llvm::sys::locale::isPrint().