Skip to content

Commit 6655a67

Browse files
authored
Support UTF-8 string literals
1 parent 1a6760f commit 6655a67

1 file changed

Lines changed: 26 additions & 5 deletions

File tree

standard/expressions.md

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4287,7 +4287,7 @@ Lifted ([§12.4.8](expressions.md#1248-lifted-operators)) forms of the unlifted
42874287

42884288
For an operation of the form `x + y`, binary operator overload resolution ([§12.4.5](expressions.md#1245-binary-operator-overload-resolution)) is applied to select a specific operator implementation. The operands are converted to the parameter types of the selected operator, and the type of the result is the return type of the operator.
42894289

4290-
The predefined addition operators are listed below. For numeric and enumeration types, the predefined addition operators compute the sum of the two operands. When one or both operands are of type `string`, the predefined addition operators concatenate the string representation of the operands.
4290+
The predefined addition operators are listed below. For numeric and enumeration types, the predefined addition operators compute the sum of the two operands. When one or both operands are of type `string`, or both are of type `ReadOnlySpan<byte>`, the predefined addition operators concatenate the string representation of the operands.
42914291

42924292
- Integer addition:
42934293

@@ -4335,15 +4335,15 @@ The predefined addition operators are listed below. For numeric and enumeration
43354335
```
43364336

43374337
At run-time these operators are evaluated exactly as `(E)((U)x + (U)y`).
4338-
- String concatenation:
4338+
- UTF-16 string concatenation:
43394339

43404340
```csharp
43414341
string operator +(string x, string y);
43424342
string operator +(string x, object y);
43434343
string operator +(object x, string y);
43444344
```
43454345

4346-
These overloads of the binary `+` operator perform string concatenation. If an operand of string concatenation is `null`, an empty string is substituted. Otherwise, any non-`string` operand is converted to its string representation by invoking the virtual `ToString` method inherited from type `object`. If `ToString` returns `null`, an empty string is substituted.
4346+
These overloads of the binary `+` operator perform concatenation of UTF-16 strings. If an operand is `null`, an empty UTF-16 string is substituted. Otherwise, any non-`string` operand that is not a ref struct ([§16.2.3]( structs.md#1623-ref-modifier)) is converted to its UTF-16 string representation by invoking the virtual `ToString` method inherited from type `object`. If `ToString` returns `null`, an empty UTF-16 string is substituted.
43474347

43484348
> *Example*:
43494349
>
@@ -4372,7 +4372,28 @@ The predefined addition operators are listed below. For numeric and enumeration
43724372
>
43734373
> *end example*
43744374
4375-
The result of the string concatenation operator is a `string` that consists of the characters of the left operand followed by the characters of the right operand. The string concatenation operator never returns a `null` value. A `System.OutOfMemoryException` may be thrown if there is not enough memory available to allocate the resulting string.
4375+
The result of the operator is a `string` that consists of the characters of the left operand followed by the characters of the right operand. The string concatenation operator never returns a `null` value. A `System.OutOfMemoryException` may be thrown if there is not enough memory available to allocate the resulting string.
4376+
- UTF-8 string concatenation:
4377+
4378+
```csharp
4379+
ReadOnlySpan<byte> operator +(ReadOnlySpan<byte> x, ReadOnlySpan<byte> y);
4380+
```
4381+
4382+
This overload of the binary `+` operator performs concatenation of UTF-8 string literals and the concatenated results thereof (which is much more restrictive than for UTF-16 string concatenation). The operands shall be UTF-8-encoded values.
4383+
The result of the operator is a ReadOnlySpan<byte> that consists of the bytes of the left operand followed by the bytes of the right operand. The result may be used directly as an operand to the UTF-8 string concatenation operator.
4384+
4385+
> *Example*:
4386+
>
4387+
> <!-- Example: {template:"standalone-console", name:"AdditionOperator2", expectedErrors:["CS9047","CS9047"]} -->
4388+
> ```csharp
4389+
> ReadOnlySpan<byte> sp1 = "ABC"u8 + "DEF"u8; // OK
4390+
> ReadOnlySpan<byte> sp2 = sp1 + "DEF"u8; // error
4391+
> ReadOnlySpan<byte> sp3 = "ABC"u8 + "DEF"u8 + "123"u8; // OK
4392+
> ReadOnlySpan<byte> sp4 = "ABC"u8 + (ReadOnlySpan<byte>)stackalloc byte[]
4393+
> { (byte)'D', (byte)'E', (byte)'F', (byte)'\x0' }; // error
4394+
> ```
4395+
>
4396+
> In the case of `sp1`, both operands are UTF-8 string literals. However, once `sp1` is initialized, that UTF-8 pedigree is no longer tracked. That is, `sp1` itself is not seen as being UTF-8 encoded. As such, it is not permitted to be an operand in the case of the initialization of `sp2`. In the initializer for `sp3`, the left pair of operands is evaluated, and as they are both UTF-8 string literals, the result is deemed to also be UTF-8 encoded, so it can further be used as the left operand of the right operator. In the case of `sp4`, while both operands are `ReadOnlySpan<byte>`s, only the left operand is UTF-8 encoded, even though the `Span<byte>` returned by `stackalloc` has the internal form of a UTF-8 string literal (that is, an array of bytes with a null-byte terminator). See [§6.4.5.6](lexical-structure.md#6456-string-literals). *end example*
43764397
- Delegate combination. Every delegate type implicitly provides the following predefined operator, where `D` is the delegate type:
43774398
43784399
```csharp
@@ -7296,7 +7317,7 @@ A *constant_expression* of type `nint` shall have a value in the range \[`int.Mi
72967317

72977318
Only the following constructs are permitted in constant expressions:
72987319

7299-
- Literals (including the `null` literal).
7320+
- Literals (including the `null` literal, but excluding UTF-8 string literals).
73007321
- Constant interpolated strings.
73017322
- References to `const` members of class, struct, and interface types.
73027323
- References to members of enumeration types.

0 commit comments

Comments
 (0)