Skip to content

Commit 044cf48

Browse files
RexJaeschkeBillWagner
authored andcommitted
Support raw string literals
1 parent cf64a32 commit 044cf48

1 file changed

Lines changed: 116 additions & 3 deletions

File tree

standard/lexical-structure.md

Lines changed: 116 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -898,18 +898,21 @@ The type of a *Character_Literal* is `char`.
898898
899899
#### 6.4.5.6 String literals
900900
901-
C# supports two forms of string literals: ***regular string literal***s and ***verbatim string literal***s. A regular string literal consists of zero or more characters enclosed in double quotes, as in `"hello"`, and can include both simple escape sequences (such as `\t` for the tab character), and hexadecimal and Unicode escape sequences.
901+
C# supports a number of forms of string literals: ***regular string literal***s, ***verbatim string literal***s, and ***raw string literals***. A regular string literal consists of zero or more characters enclosed in double quotes, as in `"hello"`, and can include both simple escape sequences (such as `\t` for the tab character), and hexadecimal and Unicode escape sequences.
902902
903903
A verbatim string literal consists of an `@` character followed by a double-quote character, zero or more characters, and a closing double-quote character.
904904
905905
> *Example*: A simple example is `@"hello"`. *end example*
906906
907907
In a verbatim string literal, the characters between the delimiters are interpreted verbatim, with the only exception being a *Quote_Escape_Sequence*, which represents one double-quote character. In particular, simple escape sequences, and hexadecimal and Unicode escape sequences are not processed in verbatim string literals. A verbatim string literal may span multiple lines.
908908
909+
A raw string literal consists of arbitrary text and newlines between multi-`"`-sequence delimiters (which better supports the readability of XML, JSON, and other forms of text that have some visually pleasing structure). A raw string literal may span multiple lines.
910+
909911
```ANTLR
910912
String_Literal
911913
: Regular_String_Literal
912914
| Verbatim_String_Literal
915+
| Raw_String_Literal
913916
;
914917
915918
fragment Regular_String_Literal
@@ -944,8 +947,59 @@ fragment Single_Verbatim_String_Literal_Character
944947
fragment Quote_Escape_Sequence
945948
: '""'
946949
;
950+
951+
fragment Raw_String_Literal
952+
: Single_Line_Raw_String_Literal
953+
| Multi_Line_Raw_String_Literal
954+
;
955+
956+
fragment Single_Line_Raw_String_Literal
957+
: Raw_String_Literal_Delimiter Raw_String_Literal Content
958+
Raw_String_Literal_Delimiter
959+
;
960+
961+
fragment Raw_String_Literal_Delimiter
962+
: '"""' '"'*
963+
;
964+
965+
fragment Raw_String_Literal Content
966+
// anything except New_Line
967+
: ~( '\u000D\u000A' | '\u000D' | '\u000A' | '\u0085' | '\u2028' | '\u2029')
968+
;
969+
970+
fragment Multi_Line_Raw_String_Literal
971+
: Raw_String_Literal_Delimiter Whitespace* New_Line
972+
(Raw_String_Literal Content | New_Line)* New_Line
973+
Whitespace* Raw_String_Literal_Delimiter
974+
;
947975
```
948976
977+
For brevity, a *Raw_String_Literal_Delimiter* is referred to as a “delimiter,” the start *Raw_String_Literal_Delimiter* is referred to as the “start delimiter,” and the end *Raw_String_Literal_Delimiter* is referred to as the “end delimiter.”
978+
979+
For any *Raw_String_Literal*:
980+
981+
- A delimiter shall be the longest set of contiguous `"` characters found at the start or end. The number of `"` characters in a delimiter is called the ***raw string literal delimiter length***.
982+
> *Example*: The string `""" """` is well-formed; it has 3-character start and end delimiters, and its content is a single space. However, the string `""""""` is ill-formed, as it is seen as a 6-character start delimiter, with no content, and no end delimiter, not as 3-character start and end delimiters and empty content. *end example*
983+
- The beginning and end delimiters shall have the same raw string literal delimiter length.
984+
> *Example*: The string `""""X""""` is well-formed; it has 4-character start and end delimiters. However, the strings `"""X""""` and `""""X"""` are ill-formed, as the start and end delimiters in each pair do not have the same length. *end example*
985+
- A *Raw_String_Literal Content* shall not contain a set of contiguous `"` characters whose length is equal to or greater than the raw string literal delimiter length.
986+
> *Example*: The strings `"""" """ """"` and `""""""" """""" """"" """" """ """""""`are well-formed. However, the strings `""" """ """` and `""" """" """` are ill-formed. *end example*
987+
- As text sequences that have the form of *Comment*s are not processed within string literals ([§6.3.3](lexical-structure.md#633-comments)), they appear verbatim in their corresponding *Raw_String_Literal Content*.
988+
989+
For a *Single_Line_Raw_String_Literal* only:
990+
991+
- A *Single_Line_Raw_String_Literal* cannot be empty; it must contain at least one character.
992+
- A *Raw_String_Literal Content* cannot begin with `"`, as such a character is considered to belong to the preceding start delimiter. Similarly, a *Raw_String_Literal Content* cannot end with `"`, as such a character is considered to belong to the following end delimiter.
993+
- The value of the literal is *Raw_String_Literal Content*, which can contain leading, embedded, and trailing horizontal whitespace (as in `"""x x x"""` and `""" xxx """`, the latter having a leading space and trailing tabs).
994+
995+
For a *Multi_Line_Raw_String_Literal* only:
996+
997+
- If *Whitespace* precedes the end delimiter on the same line, the exact number and kind of whitespace characters (e.g., spaces vs. tabs) shall exist at the beginning of each *Raw_String_Literal Content*, and that leading whitespace shall be discarded from those *Raw_String_Literal Content*s.
998+
- A *Raw_String_Literal Content* shall not appear on the same line as a start or end delimiter.
999+
- A *Multi_Line_Raw_String_Literal* can be empty (by having no *Raw_String_Literal Content*s and one or more *New_Line*s).
1000+
- A *Raw_String_Literal Content* can begin or end with `"`.
1001+
- The value of the literal is the lexical concatenation of all of its *Raw_String_Literal Content*s and *New_Lines* after any whitespace at the beginning of each *Raw_String_Literal Content* has been discarded based on whitespace preceding the ending delimiter. Whitespace following the start delimiter and preceding the end delimiter are not included.
1002+
9491003
> *Example*: The example
9501004
>
9511005
> <!-- Example: {template:"code-in-main-without-using", name:"StringLiterals", ignoredWarnings:["CS0219"]} -->
@@ -969,6 +1023,56 @@ fragment Quote_Escape_Sequence
9691023
> *end example*
9701024
<!-- markdownlint-disable MD028 -->
9711025
1026+
<!-- markdownlint-enable MD028 -->
1027+
> *Example*: Consider the following multi-line string literals:
1028+
>
1029+
> <!-- Example: {template:"standalone-console", name:"RawStringLiteral1", inferOutput:true, ignoredWarnings:["CS0219"]} -->
1030+
> ```csharp
1031+
> var xml1= """
1032+
> <element attr="content">
1033+
> <body>
1034+
> </body>
1035+
> </element>
1036+
> """;
1037+
> Console.WriteLine(xml1);
1038+
>
1039+
> var xml2 = """
1040+
> <element attr="content">
1041+
> <body>
1042+
> </body>
1043+
> </element>
1044+
> """;
1045+
> Console.WriteLine(xml2);
1046+
>
1047+
> var xml3 = """
1048+
> <element attr="content">
1049+
> <body>
1050+
> </body>
1051+
> </element>
1052+
> """;
1053+
> Console.WriteLine(xml3);
1054+
> ```
1055+
>
1056+
> which produces the output
1057+
>
1058+
> ```console
1059+
> <element attr="content">
1060+
> <body>
1061+
> </body>
1062+
> </element>
1063+
> <element attr="content">
1064+
> <body>
1065+
> </body>
1066+
> </element>
1067+
> <element attr="content">
1068+
> <body>
1069+
> </body>
1070+
> </element>
1071+
> ```
1072+
>
1073+
> In the case of `xml1`, the end delimiter has 8 leading spaces, so that is the amount of leading whitespace removed from each content line. With `xm12`, 4 leading spaces are removed, and with `xml3`, no leading spaces are removed. *end example*
1074+
<!-- markdownlint-disable MD028 -->
1075+
9721076
<!-- markdownlint-enable MD028 -->
9731077
> *Note*: Any line breaks within verbatim string literals are part of the resulting string. If the exact characters used to form line breaks are semantically relevant to an application, any tools that translate line breaks in source code to different formats (between “`\n`” and “`\r\n`”, for example) will change application behavior. Developers should be careful in such situations. *end note*
9741078
<!-- markdownlint-disable MD028 -->
@@ -982,20 +1086,29 @@ Each string literal does not necessarily result in a new string instance. When t
9821086
9831087
> *Example*: For instance, the output produced by
9841088
>
985-
> <!-- Example: {template:"standalone-console-without-using", name:"ObjectReferenceEquality", expectedOutput:["True"]} -->
1089+
> <!-- Example: {template:"standalone-console-without-using", name:"ObjectReferenceEquality", expectedOutput:["True","True","True","True"]} -->
9861090
> ```csharp
9871091
> class Test
9881092
> {
9891093
> static void Main()
9901094
> {
9911095
> object a = "hello";
9921096
> object b = "hello";
1097+
> object c = @"hello";
1098+
> object d = """hello""";
1099+
> object e = """
1100+
> hello
1101+
> """;
1102+
9931103
> System.Console.WriteLine(a == b);
1104+
> System.Console.WriteLine(a == c);
1105+
> System.Console.WriteLine(a == d);
1106+
> System.Console.WriteLine(a == e);
9941107
> }
9951108
> }
9961109
> ```
9971110
>
998-
> is `True` because the two literals refer to the same string instance.
1111+
> is all `True` because the five literals refer to the same string instance.
9991112
>
10001113
> *end example*
10011114

0 commit comments

Comments
 (0)