Skip to content

Commit 38d5138

Browse files
RexJaeschkeBillWagner
authored andcommitted
Support UTF-8 string literals
1 parent 5047856 commit 38d5138

1 file changed

Lines changed: 26 additions & 5 deletions

File tree

standard/expressions.md

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4282,7 +4282,7 @@ Lifted ([§12.4.8](expressions.md#1248-lifted-operators)) forms of the unlifted
42824282

42834283
For an operation of the form `x + y`, binary operator overload resolution ([§12.4.5](expressions.md#1245-binary-operator-overload-resolution)) is applied to select a specific operator implementation. The operands are converted to the parameter types of the selected operator, and the type of the result is the return type of the operator.
42844284

4285-
The predefined addition operators are listed below. For numeric and enumeration types, the predefined addition operators compute the sum of the two operands. When one or both operands are of type `string`, the predefined addition operators concatenate the string representation of the operands.
4285+
The predefined addition operators are listed below. For numeric and enumeration types, the predefined addition operators compute the sum of the two operands. When one or both operands are of type `string`, or both are of type `ReadOnlySpan<byte>`, the predefined addition operators concatenate the string representation of the operands.
42864286

42874287
- Integer addition:
42884288

@@ -4332,15 +4332,15 @@ The predefined addition operators are listed below. For numeric and enumeration
43324332
```
43334333

43344334
At run-time these operators are evaluated exactly as `(E)((U)x + (U)y`).
4335-
- String concatenation:
4335+
- UTF-16 string concatenation:
43364336

43374337
```csharp
43384338
string operator +(string x, string y);
43394339
string operator +(string x, object y);
43404340
string operator +(object x, string y);
43414341
```
43424342

4343-
These overloads of the binary `+` operator perform string concatenation. If an operand of string concatenation is `null`, an empty string is substituted. Otherwise, any non-`string` operand is converted to its string representation by invoking the virtual `ToString` method inherited from type `object`. If `ToString` returns `null`, an empty string is substituted.
4343+
These overloads of the binary `+` operator perform concatenation of UTF-16 strings. If an operand is `null`, an empty UTF-16 string is substituted. Otherwise, any non-`string` operand that is not a ref struct ([§16.2.3]( structs.md#1623-ref-modifier)) is converted to its UTF-16 string representation by invoking the virtual `ToString` method inherited from type `object`. If `ToString` returns `null`, an empty UTF-16 string is substituted.
43444344

43454345
> *Example*:
43464346
>
@@ -4369,7 +4369,28 @@ The predefined addition operators are listed below. For numeric and enumeration
43694369
>
43704370
> *end example*
43714371
4372-
The result of the string concatenation operator is a `string` that consists of the characters of the left operand followed by the characters of the right operand. The string concatenation operator never returns a `null` value. A `System.OutOfMemoryException` may be thrown if there is not enough memory available to allocate the resulting string.
4372+
The result of the operator is a `string` that consists of the characters of the left operand followed by the characters of the right operand. The string concatenation operator never returns a `null` value. A `System.OutOfMemoryException` may be thrown if there is not enough memory available to allocate the resulting string.
4373+
- UTF-8 string concatenation:
4374+
4375+
```csharp
4376+
ReadOnlySpan<byte> operator +(ReadOnlySpan<byte> x, ReadOnlySpan<byte> y);
4377+
```
4378+
4379+
This overload of the binary `+` operator performs concatenation of UTF-8 string literals and the concatenated results thereof (which is much more restrictive than for UTF-16 string concatenation). The operands shall be UTF-8-encoded values.
4380+
The result of the operator is a ReadOnlySpan<byte> that consists of the bytes of the left operand followed by the bytes of the right operand. The result may be used directly as an operand to the UTF-8 string concatenation operator.
4381+
4382+
> *Example*:
4383+
>
4384+
> <!-- Example: {template:"standalone-console", name:"AdditionOperator2", expectedErrors:["CS9047","CS9047"]} -->
4385+
> ```csharp
4386+
> ReadOnlySpan<byte> sp1 = "ABC"u8 + "DEF"u8; // OK
4387+
> ReadOnlySpan<byte> sp2 = sp1 + "DEF"u8; // error
4388+
> ReadOnlySpan<byte> sp3 = "ABC"u8 + "DEF"u8 + "123"u8; // OK
4389+
> ReadOnlySpan<byte> sp4 = "ABC"u8 + (ReadOnlySpan<byte>)stackalloc byte[]
4390+
> { (byte)'D', (byte)'E', (byte)'F', (byte)'\x0' }; // error
4391+
> ```
4392+
>
4393+
> In the case of `sp1`, both operands are UTF-8 string literals. However, once `sp1` is initialized, that UTF-8 pedigree is no longer tracked. That is, `sp1` itself is not seen as being UTF-8 encoded. As such, it is not permitted to be an operand in the case of the initialization of `sp2`. In the initializer for `sp3`, the left pair of operands is evaluated, and as they are both UTF-8 string literals, the result is deemed to also be UTF-8 encoded, so it can further be used as the left operand of the right operator. In the case of `sp4`, while both operands are `ReadOnlySpan<byte>`s, only the left operand is UTF-8 encoded, even though the `Span<byte>` returned by `stackalloc` has the internal form of a UTF-8 string literal (that is, an array of bytes with a null-byte terminator). See [§6.4.5.6](lexical-structure.md#6456-string-literals). *end example*
43734394
- Delegate combination. Every delegate type implicitly provides the following predefined operator, where `D` is the delegate type:
43744395
43754396
```csharp
@@ -7491,7 +7512,7 @@ A *constant_expression* of type `nint` shall have a value in the range \[-214748
74917512

74927513
Only the following constructs are permitted in constant expressions:
74937514

7494-
- Literals (including the `null` literal).
7515+
- Literals (including the `null` literal, but excluding UTF-8 string literals).
74957516
- Constant interpolated strings.
74967517
- References to `const` members of class, struct, and interface types.
74977518
- References to members of enumeration types.

0 commit comments

Comments
 (0)