String Syntax

Strings are tokens that are started and ended by a quote character. Between the quote character practically arbitrary characters can be used. Line comments and block comments are not recognized inside strings. Among the layout characters only the space (“ “) can be part of a string. When we want to include the quote character in a string we simply repeat it twice. The single quote (‘), the double quote (“) and the back quote (`) can start and end a string:

str_single --> "'" { "''" | "\\" esccont | " " | str_char } "'".
str_double --> "\"" { "\"\"" | "\\" esccont | " " | str_char } "\"".
str_back --> "`" { "``" | "\\" esccont | " " | str_char } "`".
str_char --> char except layout.

esccont --> control | meta | escape | eol.
control --> "a" | "b" | "r" | "f" | "t" | "n" | "v".
meta --> "\\" | "’" | "\"" | "`" | "/".
escape --> oct_code | uni_code | hex_code.

oct_code --> oct_digit { oct_digit } "\\".
oct_digit --> "0" ... "7".
uni_code --> "u" hex_digit hex_digit hex_digit hex_digit.
hex_code --> "x" hex_digit { hex_digit } "\\".
hex_digit --> digit | "A" ... "F" | "a" ... "f".

Examples:
"Hello ""John""!"        % is a double quoted string
`Line 1\nLine 2` % is a back quoted string
"very-long-\
code-list" % is a double quoted string
'\xE54\' % is a single quoted string
"\uD83D\uDE02" % is a double quoted string

Strings can contain escape sequences that start with the backslash (\). After the backslash escape codes, control symbol or a meta-code can follow. The escapes codes allow octal codes, uni-codes and hexadecimal codes. Surrogate pairs are automatically combined into a single code point.

The octal code is simply a sequence of octal digits terminated by the backslash. The uni-codes start with the Unicode indicator ('u') and require exactly four hexadecimal digits. The hexadecimal code starts with a hexadecimal indicator ('x') followed by a sequence of hexadecimal digits terminated by the backslash.

There are control symbols for the alert ('\a'), the backspace ('\b'), the carriage return ('\r'), the form feed ('\f'), the horizontal tab ('\t'), the new line ('\n') and the vertical tab ('\v'). Among the meta-codes we find the backslash ('\\'), single quote ('\''), the double quote ('"'), the back quote ('`') and the slash ('/'). When escaped they simply denote them.

Strings are not allowed to include an end of line. Instead the escape sequences for the line feed ('\n') should be used. A backslash followed by an end of line is used to continue a string on the next line. Strings are also not allowed to include layout characters. These have to be escaped as well.

Comments