Commit b9c4f77
authored
gh-136595: Normalize surrogate pairs in REPL input to fix UnicodeEncodeError on Windows
The new REPL implementation (_pyrepl) crashes on Windows when the user inputs Unicode characters outside the Basic Multilingual Plane (≥ U+10000), such as emoji (e.g. 🐍). This happens because the Windows input layer provides surrogate pairs (UTF-16 code units) that _pyrepl attempts to process and tokenize directly, leading to unpaired surrogate handling issues.
This commit introduces a `normalize_surrogates()` helper in `Reader` to explicitly normalize surrogate pairs by encoding to UTF-16 with 'surrogatepass' and decoding back. The `get_unicode()` method is patched to use this normalization so that any code consuming REPL input (e.g. syntax highlighting via tokenize) receives valid Unicode text.
This resolves UnicodeEncodeError crashes in the REPL when typing emoji or other non-BMP characters on Windows.
Fixes #1365951 parent b74fb8e commit b9c4f77
1 file changed
Lines changed: 8 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
43 | 49 | | |
44 | 50 | | |
45 | 51 | | |
| |||
759 | 765 | | |
760 | 766 | | |
761 | 767 | | |
762 | | - | |
| 768 | + | |
| 769 | + | |
0 commit comments