Basic character properties

Files CCore/inc/CharProp.h CCore/src/CharProp.cpp

This header contains a number of tools to deal with character properties. They are all efficient, can be used in any context and non-mutable, i.e. do not depend on any global settings, like locale.

CharCode

CCore assumes the character type is 8-bit and uses ASCII encoding (at least for the first 128 code positions). The class ASCIICode represents an ASCII code.


class ASCIICode
 {
  public:
  
   using CodeType = uint8 ;
   
   static char InverseMap(CodeType code);
   
  private:
 
   CodeType code; 
   
  public:
  
   // constructors
  
   ASCIICode();
   
   template <class Char>
   explicit ASCIICode(Char ch);
   
   // properties
   
   bool isSpecial() const;
 
   bool isVisible() const;
   
   bool isPrintable() const;
 
   bool isSpace() const;
  
   bool isPunct() const;

   bool isSpaceOrPunct() const;

   int decValue() const;
   
   int hexValue() const;

   char getChar() const;
   
   // print object
   
   template <class P>
   void print(P &out) const;
 };

/* type CharCode */

using CharCode = ASCIICode ;

ASCIICode default constructor creates a zero ASCII code.

ASCIICode constructor maps a character (one of type char, signed char, unsigned char) into its ASCII code.

isSpecial() — special, i.e. not intended to represent a symbol.

isVisible() — visible, i.e. printable and not a space.

isPrintable() — printable, i.e. not a special.

isSpace() — space and some special "space-like" characters.

isPunct() — punctuation characters.

isSpaceOrPunct() — equivalent of isSpace() || isPunct().

decValue() — decimal value of the character, or -1 if not a decimal digit.

hexValue() — hexadecimal value of the character, or -1 if not a hexadecimal digit.

getChar() maps the code back to the character type.

ASCIICode is printable. Special characters are printed using C-slash representation, like "\n".

CharCode is a typedef for the ASCIICode, it's an abstraction from an exact type of encoding.

PrintCString

PrintCString is a helper class to print a string. Each special symbol is printed using its C representation, like "\n".


class PrintCString
 {
   StrLen str;

  public:

   explicit PrintCString(StrLen str_) : str(str_) {}
 
   using PrintOptType = StrPrintOpt ;
   
   template <class P>
   void print(P &out,PrintOptType opt) const;
 };

Character properties

The following functions return character properties. The type Char is one of traditional character types: char, signed char, unsigned char.


template <class Char>
bool CharIsSpecial(Char ch) { return CharCode(ch).isSpecial(); }
 
template <class Char>
bool CharIsVisible(Char ch) { return CharCode(ch).isVisible(); }
 
template <class Char>
bool CharIsPrintable(Char ch) { return CharCode(ch).isPrintable(); } 
 
template <class Char>
bool CharIsSpace(Char ch) { return CharCode(ch).isSpace(); }
  
template <class Char>
bool CharIsPunct(Char ch) { return CharCode(ch).isPunct(); }

template <class Char>
bool CharIsSpaceOrPunct(Char ch) { return CharCode(ch).isSpaceOrPunct(); }

template <class Char>
int CharDecValue(Char ch) { return CharCode(ch).decValue(); }
 
template <class Char>
int CharHexValue(Char ch) { return CharCode(ch).hexValue(); }

CharIsSpecial() — special, i.e. not intended to represent a symbol.

CharIsVisible() — visible, i.e. printable and not a space.

CharIsPrintable() — printable, i.e. not a special.

CharIsSpace() — space and some special characters. The full list is " \t\f\v\r\n". This list can be obtained by the GetSpaceChars() function.

CharIsPunct() — punctuation characters.

CharIsSpaceOrPunct() — equivalent of CharIsSpace() || CharIsPunct().

CharDecValue() — decimal value of the character, or -1 if not a decimal digit.

CharHexValue() — hexadecimal value of the character, or -1 if not a hexadecimal digit.

Character sets

The following functions return zero-terminated strings, these strings contain some important character sets.


inline const char * GetSpaceChars() { return " \t\f\v\r\n"; }

inline const char * GetPunctChars() { return "!\"#$%&'()*+,-./:;<=>?@[\\]^`{|}~"; } 

inline const char * GetDigitChars() { return "0123456789"; }

inline const char * GetHexDigitChars() { return "0123456789abcdefABCDEF"; }

inline const char * GetCLetterChars() { return "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_"; }

GetSpaceChars() — "traditional" C-space characters.

GetPunctChars() — punctuation characters.

GetDigitChars() — decimal digits.

GetHexDigitChars() — hexadecimal digits.

GetCLetterChars() — C-letters, including underscore.

Line parsing

The following function can be used to split the given text into lines:


StrLen CutLine(StrLen &text);

The function performs a text search for the line-end dividers: "\r", "\n", "\r\n". If a divider is found, the part before the divider is returned and text is changed to the part after the divider. Otherwise, the entire text is returned and text is changed to the null value.