Module Unicode

Unicode utilities

type status
val classify : int -> status

Classify a unicode char into 3 classes or unknown.

val ident_refutation : string -> (bool * string) option

Return None if a given string can be used as a (Coq) identifier. Return Some (b,s) otherwise, where s is an explanation and b is severity.

val is_valid_ident_initial : status -> bool

Tells if a valid initial character for an identifier

val is_valid_ident_trailing : status -> bool

Tells if a valid non-initial character for an identifier

val is_letter : status -> bool

Tells if a letter

val is_unknown : status -> bool

Tells if a character is unclassified

val lowercase_first_char : string -> string

First char of a string, converted to lowercase

  • raises Assert_failure

    if the input string is empty.

val split_at_first_letter : string -> (string * string) option

Split a string supposed to be an ident at the first letter; as an optimization, return None if the first character is a letter

val is_basic_ascii : string -> bool

Return true if all UTF-8 characters in the input string are just plain ASCII characters. Returns false otherwise.

val ascii_of_ident : string -> string

ascii_of_ident s maps UTF-8 string to a string composed solely from ASCII characters. The non-ASCII characters are translated to "_UUxxxx_" where xxxx is the Unicode index of the character in hexadecimal (from four to six hex digits). To avoid potential name clashes, any preexisting substring "_UU" is turned into "_UUU".

val is_utf8 : string -> bool

Validate an UTF-8 string

val utf8_length : string -> int

Return the length of a valid UTF-8 string.

val utf8_sub : string -> int -> int -> string

Variant of String.sub for UTF-8 strings.

val escaped_if_non_utf8 : string -> string

Return a "%XX"-escaped string if it contains non UTF-8 characters.